Simple word indexer (15)

. Continued from the previous.

Positioning to a character (3)

I made a much simplified variant of my testing program, in order to learn more about the nature of the bug I described in my previous article. The test file now contains only one character. It is written, then read, in only a single stream.

When the source file is compiled with the command:
cc siworin15.c -o x
and run as simply:
./x
no infinite loop occurs! The output is:
Line 35, i = 1
Line 38, i = 1
Line 41, error 84 Invalid or incomplete multibyte or wide character

And that is correct. If however the program is run as:
./x loop
the output is:
Line 35, i = 0
Line 38, i = 0
Line 35, i = 1

and the program never terminates, unless a signal is sent by pressing ctrl-c.

The difference is, that if there is no command line argument, the program starts at the second byte (byte number 1) of the file. The file contains the byte sequence c3-a1 (in hex), which is the UTF-8 encoding for Unicode character e1, meaning á. That second byte, hex a1, is invalid because it starts with the bits 10, so the byte cannot be the start of a UTF-8 encoding, only a follow-up byte.

The fseek succeeds, and fgetwc (get a wide character from a multi­byte stream) sets the error code to:
84 Invalid or incomplete multibyte or wide character.
Correctly handled.

In the other test case, calling the program with a command line argument (any, "loop" is just an example), an fseek is done to the first byte (byte 0) of the file (without any effect, as the file pointer was already at the start), and fgetw reads a correct 2-byte character from it. THEN the fseek to the incorrect position (second byte, byte number one) is done, and THAT gets the GNU glibc library (2.31, 2.33) into an endless loop.

This means the bug is less severe than I first thought. In the case in which I might have needed to fseek to a possibly incorrect character position in a file, for providing context for a found search word, I would NOT first have read the full character. Because I don’t know where that starts. So I probably would not have encountered the bug. I only encountered it in a test program that does things that do not make sense in a real-life application.

Yet, I insist that a library function must never get into an infinite loop, so the bug should be repaired. But it is less urgent than I thought.


Update 23 August 2021

The day before yesterday I wrote I hadn’t tested with glibc version 2.34, which is the latest stable version. Today however I did, with a library, loader and locale freshly compiled and locally installed, from sources in glibc-2.34.tar.xz, downloaded from GNU itself. Result: glibc 2.34 also contains the bug, as do 2.28 (under Debian), 2.31 (Mint and Ubuntu) and 2.34 (Ubuntu).

Update 2: Bug reported


To the next article (GPLv3), and see also this one on the same sub-subject.