Voynich revisited


Renewed interest

The 13th of June 2023 I took a look at the programme of the freshly announced inter­national Interlingua conference in Warsaw, noticed the name Greg Kondrak, and thought “Sounds so familiar, how do I know him?” A quick search made that clear: the Voynich Manuscript (henceforth: VMS). Exactly 5 years ago today – what a strange coincidence! – I sent an open e-mail to him and Bradley Hauer, about their idea that the underlying language of the Voynich Manuscript might be Hebrew.

In fact, what I critised – and I still think: with good reason – was the only part of their paper (“Decoding Anagrammed Texts Written in an Unknown Language and Script”) that I had properly read and understood, which is at the very end, just before the conclusion, and I quote:

«Nevertheless, it is interesting to take a closer look at specific examples of the system output. The first line of the VMS (VAS92 9FAE AR APAM ZOE ZOR9 QOR92 9 FOR ZOE89) is deciphered into Hebrew as
המצות ועשה לה הכהן איש אליו לביחו ו עלי אנשיו.‏
According to a native speaker of the language, this is not quite a coherent sentence. However, after making a couple of spelling corrections, Google Trans­late is able to convert it into passable English: “She made recommendations to the priest, man of the house and me and people.”

I pointed out that I had obtained similar ‘interesting’ English phrases by letting Google Translate (GT) handle Interlingua in Greek script. Later on, Google fixed that error – cause that’s what I think it was, an error.

Interestingly, when now in 2023 I feed Hauer & Kondrak’s Hebrew sentence to GT (without the spelling corrections they did not further specify), I get the following ‘English translation’:
The unleavened bread, and the priest made her a man to whom they ran, and his men over me.”
An early case of documented gender dysphoria? Or an ancient pre-Sumerian creation story in which the woman was created first? Cf. Genesis 2:21-23.

No, just nonsense of course, from a language model getting out of control.

(Perhaps the spelling correction was needed because in the PDF, the final nun of הכהן as I copied it, wasn’t Hebrew, but a Latin N, perhaps disguised by some special font? I got an N in the middle of the Hebrew when I copied the phrase.)


This time, on 13 June 2023, I did read the whole paper by Hauer & Kondrak, and I tried to understand everything. See also this announcement on the site of the University of Alberta.

And I thought the case for the language being Hebrew seemed quite strong, and the idea that perhaps anagrams had been used, I thought was smart. If true, I wondered, perhaps a separate encoding of the final letters of the Hebrew alphabet could give clues for un­garbling the anagrams?

René Zandbergen wrote a review of the paper: Hauer and Kondrak (2016), and he didn’t think they were on the right track. I quote from his conclusion:

The paper tentatively suggests that the Voynich MS text was generated by anagramming a Hebrew text on a word by word basis, and applying a mono-alphabetic substitution cipher (MASC). If this were correct, one should be able to take an existing, and ideally old, Hebrew text, apply this procedure, and arrive at a text that shares all important statistics with the Voynich MS text (Currier language B and Currier transcription alphabet). Unfortunately, this is not attempted, and I would like to argue that this will not be possible. The Voynich MS has a number of features that are not addressed in the paper.

René Zandbergen is the compiler and maintainer of the site voynich.nu, which is quite comprehensive, and also contains links to important material by other investigators of the VMS. During the days that followed, intermittently because of other activities, I read practically all of it.


Inspired by the anagrams and the mention of Hebrew, I had this wild idea, that the underlying language of the VMS might be Arabic, or Mozarabic or Ladino in Arabic script, and that some of the diacritical dots might have been left out, as I think was the case in older documents in Arabic. That might have turned all the letters b, t, th, n, y, in contexts where they are connected on both sides, so only small indistinguishable letter forms remain, into something similar to a small i, without the dot. We see those in the VMS, in various repeating sequences.

I assumed the full Arabic letters (that is, isolated or connected to the right) ب ت ث ن ي would have been encoded as they are, but the connected بتثنيب in Voynichese would look like iiiiin, or iiiiin in Eva, Landini & Zandbergen’s Extensible Voynich Alphabet. So then part of the decoding effort would be to disambiguate those undotted letters, by looking for existing and fitting Arabic roots, or plausible words in one of the other two suggested languages.


This idea I gave up very soon, as I read more about the statistical peculiarities of the text in the Voynich Manuscript. It simply cannot be the solution, just like so many other earlier proposals can’t.

As René Zandbergen stated in his critique of Hauer & Kondrak:

“[...] it is known from earlier studies that the Voynich MS text does not have the characteristics of a simple cipher (4), and specifically is not a MASC of a well-known European language (old or new) (5).” (MASC = Mono-Alphabetic Substitution Cipher.)

OK, no well-known European language, but maybe East-Asian languages are still an option? Perhaps that should be investigated further. Not by me, though.

One of the most important sections of Zandbergen’s site (see here for the full table of contents) is where he describes efforts to analyse the structure of VMS ‘words’. I write ‘words’, not words, because we don’t know if what looks like text in an unknown script, is actually a language, or encodes one. We don’t even know if that writing has any meaning at all.

We can read about word structure observations by John H. Tiltman, Peter Long, Mike Roe, Robert Firth and, most importantly, Jorge Stolfi. Jorge Stolfi is a Brazilian professor, of Venetian descent, who has his own site, with everything to do with the VMS linked here.

Jorge Stolfi first distinguished ‘soft’ and ‘hard’ characters, then arrived at an ‘OKOKO’ paradigm. Eventually he managed to set up a word grammar. The formal definition is here, and this is a text version without the HTML formatting. A very large number of VMS ‘words’ adhere to this grammar. The others might be errors or special cases. Conversely, using the grammar it is possible to generate text that looks like Voynichese, although then the statistics aren’t quite as expected, because some interdependencies of choices are not covered by the grammar.

René Zandbergen wrote, in the summary of his section 3, and I agree with him:

The results presented in this page are critically important for anyone interested in translating the text of the Voynich MS. The fact that structures like the ones introduced in this page exist, tells us that the MS text is not one that was encrypted from an Indo-European plain text using the type of encryption available in the early 15th Century. Any tentative solution working along these lines will necessarily fail.

Brilliant idea

Then on 14 June 2023, after all this reading, just before dinner in a bout of extreme fatigue, at about 18:30 CEST I suddenly had this brilliant idea.

Well, brilliant, I can’t prove it, I don’t even see any ways to make it seem plausible that there is any truth in it. As long as that doesn’t change, my idea is virtually worthless.

But I’m going to describe it anyway.