In my C program I wanted to be able to handle international characters
in UTF-8. So I used standard library functions like mblen
and mbtowc
, that I had discovered in
talk notes by Ingo Schwarze:
“Why and how you ought to Keep multibyte
character support simple, EuroBSDCon, Beograd, September 25,
2016”. (Nice Canadian mountain, campground and rivulet
photos, by the way.)
But whatever I tried, they didn’t work. No multibyte characters, put in a
test string in UTF-8 (the default encoding of Linux Mint) were ever recognised.
When I ran locale
in the born again shell bash
,
I got this:
LANG=en_US.UTF-8 LANGUAGE=en_US LC_CTYPE="en_US.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="en_US.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="en_US.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=
I expected my little test program to inherit that, and be aware of UTF-8.
Because from past experiences with the bourne shell sh
, I
remembered that environment variables are not always inherited by
subprocesses by default, I even exported them. Still to no avail.
Notable fact: standard library (stlib.h
) macro
MB_CUR_MAX
stubbornly kept evaluating to 1. Never more.
Always read man pages, of course. I had, and did again. I took me a long time
to finally find the solution. If you run:
man locale
it defaults to
man 1 locale
.
As usual: first one found is shown. But that page isn’t very informative.
What you actually should read is:
man 7 locale
.
There it says:
“The header <locale.h> declares data types,
functions and macros which are useful in this task.
The functions it declares are setlocale(3)
to set the current
locale,” [...]
man 3 setlocale
:
“If locale is an empty string, "", each part of the
locale that should be modified is set according to the environment variables.
[...]
On startup of the main program, the portable "C" locale is selected as default.
A program may be made portable to all locales by calling:
setlocale(LC_ALL, "");
”
That helped. Now it works. I thought I should share.
Copyright © 2020 by R. Harmsen, all rights reserved.