Setlocale

7 February 2020

In my C program I wanted to be able to handle international characters in UTF-8. So I used standard library functions like mblen and mbtowc, that I had discovered in talk notes by Ingo Schwarze: “Why and how you ought to Keep multibyte character support simple, EuroBSDCon, Beograd, September 25, 2016”. (Nice Canadian mountain, campground and rivulet photos, by the way.)

But whatever I tried, they didn’t work. No multibyte characters, put in a test string in UTF-8 (the default encoding of Linux Mint) were ever recognised. When I ran locale in the born again shell bash, I got this:

LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

I expected my little test program to inherit that, and be aware of UTF-8. Because from past experiences with the bourne shell sh, I remembered that environment variables are not always inherited by subprocesses by default, I even exported them. Still to no avail.

Notable fact: standard library (stlib.h) macro MB_CUR_MAX stubbornly kept evaluating to 1. Never more.

Always read man pages, of course. I had, and did again. I took me a long time to finally find the solution. If you run:
man locale
it defaults to
man 1 locale.
As usual: first one found is shown. But that page isn’t very informative. What you actually should read is:
man 7 locale.

There it says:
“The header <locale.h> declares data types, functions and macros which are useful in this task.
The functions it declares are setlocale(3) to set the current locale,” [...]

man 3 setlocale:
“If locale is an empty string, "", each part of the locale that should be modified is set according to the environment variables.
[...]
On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:
setlocale(LC_ALL, "");”

That helped. Now it works. I thought I should share.