. Continued from the previous.
I made a much simplified variant of my testing program, in order to learn more about the nature of the bug I described in my previous article. The test file now contains only one character. It is written, then read, in only a single stream.
When the source file is compiled with the command:
cc siworin15.c -o x
and run as simply:
./x
no infinite loop occurs! The output is:
Line 35, i = 1
Line 38, i = 1
Line 41, error 84 Invalid or incomplete multibyte or wide character
And that is correct. If however the program is run as:
./x loop
the output is:
Line 35, i = 0
Line 38, i = 0
Line 35, i = 1
and the program never terminates, unless a signal is sent by
pressing ctrl-c.
The difference is, that if there is no command line argument, the program starts at the second byte (byte number 1) of the file. The file contains the byte sequence c3-a1 (in hex), which is the UTF-8 encoding for Unicode character e1, meaning á. That second byte, hex a1, is invalid because it starts with the bits 10, so the byte cannot be the start of a UTF-8 encoding, only a follow-up byte.
The fseek
succeeds, and fgetwc
(get a wide character from a multibyte stream) sets the error
code to:
84 Invalid or incomplete multibyte or wide character
.
Correctly handled.
In the other test case, calling the program with a command line
argument (any, "loop" is just an example), an fseek
is done to the first byte (byte 0) of the file (without any effect,
as the file pointer was already at the start), and fgetw
reads a correct 2-byte character from it. THEN the fseek
to the incorrect position (second byte, byte number one) is done, and
THAT gets the GNU glibc library (2.31, 2.33) into an endless loop.
This means the bug is less severe than I first thought. In the case
in which I might have needed to fseek
to a possibly
incorrect character position in a file, for
providing context
for a found search word, I would NOT first have read the full
character. Because I don’t know where that starts. So I probably
would not have encountered the bug. I only encountered it in a
test program that does things that do not make sense in a real-life
application.
Yet, I insist that a library function must never get into an infinite loop, so the bug should be repaired. But it is less urgent than I thought.
The day before yesterday
I wrote
I hadn’t tested with glibc version 2.34, which is the latest stable
version. Today however I did, with a library, loader and locale
freshly compiled and locally installed, from sources in
glibc-2.34.tar.xz
, downloaded from
GNU itself.
Result: glibc 2.34 also contains the bug, as do 2.28 (under Debian),
2.31 (Mint and Ubuntu) and 2.34 (Ubuntu).
Update 2: Bug reported
To the next article (GPLv3), and see also this one on the same sub-subject.
Copyright © 2021 by R. Harmsen, all rights reserved.