Simple word indexer (3)

. Continued from the previous.

Hyper Estraier (2)

Compiler warnings

This is still about the 2nd of July 2021, when I was compiling Mikio Hirabayashi’s QDBM version 1.8.78 and Hyper Estraier version 1.4.13, under Ubuntu Server 20.04.2 LTS (GNU/Linux 5.4.0-77-generic x86_64), using as the C compiler gcc version 9.3.0-17. Compiler options, as set by the makefile, were -Wall -pedantic -fPIC -fsigned-char -O3 -fomit-frame-pointer -fforce-addr -minline-all-stringops.

Ignoring return value

There are several compiler warnings of this type, about functions like nice, system, and, more suspiciously, write and fread (why mix man 2 and man 3 functions, I wonder?). I admit there are situations in which the chances of failure for a file read or write are slim (file open was checked, plenty of disk space), but of course testing the return value, and responding appropriately to an unexpected failure, is always better.

Possible buffer overflow

There were several warnings similar to the following:

cabin.c: In function ‘cbdatestrwww’:
cabin.c:3066:47: warning: ‘%s’ directive writing up to 63 bytes into a region of size between 0 and 45 [-Wformat-overflow=]
 3066 |   sprintf(date, "%04d-%02d-%02dT%02d:%02d:%02d%s", year, mon, day, hour, min, sec, tzone);

I analysed some of them, and found that the compiler is right (of course!) and overflow can occur in theory. But in practice, it won’t. In the example given, overflow would only occur if the date and time fields were corrupted, and contained much higher integer values than is normal, considering their meaning.

So the compiler is pedantic here, as requested, and this probably isn’t the cause of the segmentation fault that kept me from using this software under Ubuntu Server. However, it is of course better to write code that even a pedantic compiler has no comments about, whenever possible.

Exceeds maximum object size

Then there’s this:

In file included from villa.h:26,
                 from villa.c:19:
In function ‘vlleafaddrec’,
    inlined from ‘vlput’ at villa.c:299:7:
./cabin.h:1308:16: warning: argument 2 range [18446744071562067968, 18446744073709551615] exceeds maximum object size 9223372036854775807 [-Wallo>
 1308 |   (((CB_ptr) = realloc((CB_ptr), (CB_size))) ? (CB_ptr) : cbmyfatal("out of memory"))
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
./cabin.h:1514:7: note: in expansion of macro ‘CB_REALLOC’
 1514 |       CB_REALLOC((CB_list)->array, (CB_list)->anum * sizeof((CB_list)->array[0])); \
      |       ^~~~~~~~~~
villa.c:2241:11: note: in expansion of macro ‘CB_LISTPUSHBUF’
 2241 |           CB_LISTPUSHBUF(recp->rest, tbuf, tsiz);

I translated the long decimal numbers into hexadecimal, which makes them clearer:

18446744071562067968 = FFFFFFFF80000000
18446744073709551615 = FFFFFFFFFFFFFFFF (64 bits)
 9223372036854775807 = 7FFFFFFFFFFFFFFF

Is this an issue of signed and unsigned values? Or 32 versus 64 bits? Or both? The FreeBSD 11.2 and 12.2 on which Hyperestraier did and still does work without dumping core, are also 64 bits. But perhaps the compilation and executable are not? I checked using the file command, and no, all of /usr/local/bin/est* are listed as “ELF 64-bit LSB executable, x86-64”. So why is there a segmentation fault under Ubuntu and not under FreeBSD? Well, implementation details, I mentioned them before.

Due to all the macros and typedef’d structures I find it hard, even through I program in C since 1985, of which professionally between 1990 and 2004, to see what is really going on here, if this is really dangerous, and if it could explain the segmentation fault. Well, somebody else’s code is always harder to understand that one’s own.

During the compilation, there are several occurrences of this type of warning. Not reassuring.

Comparison with string literal

The source file estraier.c produces several warnings like this:

estraier.c: In function ‘est_aidx_attr_narrow’:
estraier.c:7574:10: warning: comparison with string literal results in unspecified behavior [-Waddress]
 7574 |   if(cop == ESTOPSTROREQ && sign && !sval){

where the variable ‘cop’ is of type const char * and in estraier.h it says #define ESTOPSTROREQ "STROREQ". Where and how the compiler stores string literals, and whether duplicates are combined or stored separately, per source file or for the whole program, all of that is implementation dependent. So I agree with the compiler that comparisons of this type are unwise. Yet, I suspect in practice this doesn’t cause any real problems here.

Segmentation fault

The compiler warnings I would have swallowed, if only Hyperestraier had worked. But it doesn’t. Already in the database checking phase of the installation, and also later in hyperestraier itself, it runs into a segmentation fault:

rm -rf casket*
LD_LIBRARY_PATH=.:/lib:/usr/lib:/usr/local/lib:/home/rudhar/lib:/usr/local/lib \
   ./odtest write casket 500 50 5000
<Writing Test>
  name=casket  dnum=500  wnum=50  pnum=5000  ibnum=-1  idnum=-1  cbnum=-1  csiz=-1
......make[1]: *** [Makefile:311: check] Segmentation fault

I used gdb (GNU’s debugger) to find out where it happens, and the result was:

Program received signal SIGSEGV, Segmentation fault.
0x000000000040d795 in cblistpush
   (list=0x53adb0, ptr=ptr@entry=0x7fffffffe0a0 "00000192", size=8, size@entry=-1) at cabin.c:780
780	  CB_MALLOC(list->array[index].dptr, (size < CB_DATUMUNIT ? CB_DATUMUNIT : size) + 1);

Via the macro definition, this is indeed a malloc. It isn’t easy to see how a malloc could cause a segmentation fault. A realloc could, if an invalid pointer was passed. Therefore I suspect that the actual problem occurred already before the malloc, and that some of the memory areas not intended for use by application code, but for internal bookkeeping purposes, have been overwritten and corrupted.

The cause, and so the remedy, of bugs of this type can be hard to find. I am not the person who is going to do it.

I’m not the only one experiencing this segmentation fault. I found this Japanese site, which quotes the name segmentation fault from ./odtest write casket 500 50 5000 that I had, and from the translation by Google Translate, I learnt that it says: “When I searched for various information, there was a similar report. Apparently it happens with gcc 7 and not with gcc 6.” It quotes some other site (now non-existent) that said: “estcmd built with gcc-7.2.0 caused segfault. I sent mails to the author, but I couldn't get a reply.

So now what next?

That night of the 2nd of July 2021, I was so fed up with the whole situation, of local search engines that don’t properly install, don’t properly work, or work a few years, but not on all platforms, and that are not properly maintained, that I took the brave step: I decided to write one myself. Couldn’t be so hard, if you keep it simple.

More on that in the next episodes.