7 February 2020
In my C program I wanted to be able to handle international characters
in UTF-8. So I used standard library functions like
mbtowc, that I had discovered in
talk notes by Ingo Schwarze:
“Why and how you ought to Keep multibyte
character support simple, EuroBSDCon, Beograd, September 25,
2016”. (Nice Canadian mountain, campground and rivulet
photos, by the way.)
But whatever I tried, they didn't work. No multibyte characters, put in a
test string in UTF-8 (the default encoding of Linux Mint) were ever recognised.
When I ran
locale in the born again shell
I got this:
LANG=en_US.UTF-8 LANGUAGE=en_US LC_CTYPE="en_US.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="en_US.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="en_US.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=
I expected my little test program to inherit that, and be aware of UTF-8.
Because from past experiences with the bourne shell
remembered that environment variables are not always inherited by
subprocesses by default, I even exported them. Still to no avail.
Notable fact: standard library (
MB_CUR_MAX stubbornly kept evaluating to 1. Never more.
Always read man pages, of course. I had, and did again. I took me a long time
to finally find the solution. If you run:
it defaults to
man 1 locale.
As usual: first one found is shown. But that page isn’t very informative. What you actually should read is:
man 7 locale.
There it says:
The functions it declares are
setlocale(3) to set the current
man 3 setlocale:
“If locale is an empty string, "", each part of the locale that should be modified is set according to the environment variables.
On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:
That helped. Now it works. I thought I should share.