Conversion of mixed UTF8 and ISO-8859-1/CP1252
; update Win32 and publication of sources: 6 May 2016
Download
Free exe download (Win32 format), old 16-bits MSDOS version.
Usage
This is a simple, specialised UTF8 to Latin1 converter.
Unlike general purpose converters like uniconv, it can reliably convert textfiles in languages like German, French Spanish etc. in which part of the text is in ISO-8859-1 (Latin1) and part in UTF8. It does this by detecting what is probably UTF8 and what is certainly not.
Optionally (-w), it also supports Windows code page 1252, instead of plain ISO-8859-1, so things like curly quotes, n-dashes (–), French oe-ligatures (œ, Œ), Slavic s or z with haček (Š, š, Ž, ž) are also supported. See also CP1252, on Roman Czyborra’s site.
Usage: utf8mixd [-w] infile outfile
Update 28 September 2019
Originally, the output was always ISO-8859-1 or Windows-1252. But now there is an option -u, which causes the output to be all in UTF-8. More useful in these modern times, where many operating systems, at least Linux Mint I now work on, have that encoding as their overall default.
The sources and executables were updated accordingly.
Sources
The source files are here.
© 2008, 2016, 2019 R. Harmsen. But anyone may use this as they see fit.
Link
For a similar, but more versatile tool, see this perl script by Helmut Richter.