14 and , 19 September 2021. Continued from the previous.
The standard C library functions
qsort are similar in that they operate
on an array. The caller should specify the start of
the array, the number of elements, and the size of
In my case of a simple word indexer, I do not have an array in core (core is an old-fashioned expression for internal working memory, or RAM), but disk files instead. I don’t want to read any disk files into memory, I prefer to rely on automatic buffering or disk caching by the operating system.
I soon found that even then I can use the standard
bsearch. As a base pointer I pass a character
pointer that I did declare, but that doesn’t point to an array.
nmemb is the number of lines in each of my index
files. One file has a fixed number of bytes in each line
(namely the length of a hexadecimal offset, plus a newline),
so I can efficiently calculate the number of lines by asking
fstat for the number of bytes in the file.
size, I specify
so effectively 1. See functions
Now isn’t that dangerous? Passing a pointer to a non-existent
bsearch function will repeatedly
calculate a pointer to an array element based on that, and pass
it to the specified function for comparison with the key: is the
key greater than the element, less than the element, or equal to
it? See function
siworin_compar in source file
Yes, that is dangerous and illegal if you ever dereference such a pointer. That would almost certainly cause a segmentation fault or a similar catastrophic error. Because the pointer points to an array element that doesn’t exist, an element for which no storage has ever been allocated.
But I never do that. Instead of dereferencing the pointer,
i.e. access the element, I use the pointer only to calculate
the difference with the base pointer. In other words, although
it is a pointer that is passed to the comparison function, I
use it as an index number. With that index number, I can access
my disk file (first
siworin.words), and make the comparison between the
key and the word found on disk.
Strange though it may seem, I think this is safe, and portable, i.e. it works correctly regardless of processor architecture.
Because I allocate
as a global variable, not an automatic
stack-based variable, it is automatically initialised to all zero
bits, so the pointer is NULL. Is there any architecture, in which
you cannot add an integer value up to a few hundred thousands,
or even millions, to a NULL pointer to obtain something that could
be a valid pointer? It doesn’t point to validly allocated memory,
but that is a different matter.
bsearch adds a proportion of
base, and in the comparison function, I subtract
base from that result, the reverse operation. How
could that go wrong? I don’t see it.
So yes, the standard
bsearch can be
used to search an array in memory, but also a disk file.
On second thought (15 August), a pointer being NULL may be seen as a special status, meaning the pointer is invalid for dereferencing. Then doing pointer arithmetic with NULL is rather strange, and potentially dangerous, even if in practice it usually isn’t.
To improve this, instead of a pointer I now declare a character
array, initialised with an empty string, which of course then
consists of one character, a null byte. The address of that array
is then passed to
bsearch, and used to calculate
the index in the comparison function
This cast pointer points to legitimately allocated storage, even if it has a size of just one byte. So adding an integer to that seems more sensible and more responsible than adding it to NULL.
Test: it works flawlessly. But that doesn’t prove that it always will, in any environment. Yet, it is my opinion that this safe and portable.