Conversion of mixed UTF8 and ISO-8859-1/CP1252

; update Win32 and publication of sources: 6 May 2016

Download

Free exe download (Win32 format), old 16-bits MSDOS version.

Usage

This is a simple, specialised UTF8 to Latin1 converter.

Unlike general purpose converters like uniconv, it can reliably convert textfiles in languages like German, French Spanish etc. in which part of the text is in ISO-8859-1 (Latin1) and part in UTF8. It does this by detecting what is probably UTF8 and what is certainly not.

Optionally (-w), it also supports Windows code page 1252, instead of plain ISO-8859-1, so things like curly quotes, n-dashes (–), French oe-ligatures (œ, Œ), Slavic s or z with haček (Š, š, Ž, ž) are also supported. See also CP1252, on Roman Czyborra’s site.

Usage: utf8mixd [-w] infile outfile

Update 28 September 2019

Originally, the output was always ISO-8859-1 or Windows-1252. But now there is an option -u, which causes the output to be all in UTF-8. More useful in these modern times, where many operating systems, at least Linux Mint I now work on, have that encoding as their overall default.

The sources and executables were updated accordingly.

Sources

The source files are here.



Link

For a similar, but more versatile tool, see this perl script by Helmut Richter.