Old or recent?

Idea 9 July, text 11–

A forgery?

Is the Voynich manuscript (VMS) genuinely mediaeval, or a relatively recent forgery? Richard SantaColoma, known for voynich.net and the VMS Forum, and for his blog, thinks it is, a forgery by Wilfrid Voynich himself. He argues that claim in The Modern Forgery Hypothesis. That text is also here, in a site with an expired, self-signed TLS certi­ficate, which browsers object to, with good reason, although I think in this particular case it is harmless to go on to it anyway, and let the browser add an exception.

See also SantaColoma’s New Atlantis theory.


I do however lean strongly towards the idea that the VMS is genuine, that it was really created soon after the parchment was obtained, which according to radio-carbon dating was between 1404 and 1438, with 95% probability.

There are just too many good reasons to believe it’s authentic, and too many experts who think so too, as René Zandbergen described.

Whoever may have been the author, or authors, and whenever it was created, in 1438 or 1910 or any other year, the fact remains that the ‘words’ of the VMS clearly betray a structure. With few exceptions, they obey a certain word grammar, as Jorge Stolfi found out and documented. Any conjectured explanation of how the VMS came about, must pre­sent an explanation for this, or risk not being taken seriously.

An autistic monk

My current idea is the VMS was indeed created in the early 15th century, by a monk (or it could also have been a nun) who had a condition which we would nowadays consider to be in the autistic spectrum.

The other monastics liked his plant drawings, and although they saw no practical use or religious value in the other drawings, nor in the incomprensible text, they just let him continue, because it did no harm and that is how he felt contented, and how he could work on calmly and without tension or stress.

The monk had first set up a schedule, a diagram, which contained essentially the same info as what in our days is in Stolfi’s word grammar. But it was represented in a different way, easier to use for humans. Computers were of no concern, because they didn’t exist yet in the 15th century. Duh!

Using that schedule, the monk could generate text consisting of plausible looking words of an unknown language. But they had no meaning. The monk just enjoyed the beauty and regularity of the words, and the ‘phrases’ he could form with them. His fellow monks didn’t care, or might have thought he was some sort of prophet, and that God might some day reveal what the monk’s cryptic texts mean.

I don’t believe that, I think the text is just meaningless, and any quest to find its meaning is futile.

All of this of course is still only my conjecture, without any evidence or proof.


The monk didn’t use the rather Chomskyan rewrite rules or production rules, which are easy to use for computers, but rather difficult to handle for humans. Compilers for pro­gram­ming languages like C used such grammars, maybe still, and programs such as yacc (“Yet Another Compiler Compiler”), in combination with the lexical analyser lex were used to generate grammar checkers like lint, and compilers like cc.

Modern compiler projects may use different tools, but the basics probably aren’t much different.


There are other ways to define and represent a grammar. From an early stage of my career in Information Technology, in 1982 or 1983, I remember something I thought was called the Codasyl Report on Databases, or something similar. I now find that Codasyl is much older, it started in 1959, when I was only four years old and had no idea. Codasyl also defined Cobol, before they worked on databases.

I hardly ever programmed in Cobol, but I do remember seeing grammar definitions for it, in the same style as for the database interface. It used a lot of brackets [ ] and braces { } to indicate choices. I have difficulty finding it now. It wasn’t Backus-Naur notation, nor was it Bachman. I expected to see what I remember here, in the “Network (CODASYL) Data Model (Course Library)”, but that’s not like what I see before my mental eyes.

Found it! NIST, the US National Institute of Standards and Technology, still has this legacy document: “CODASYL data description language: journal of development, June 1973”. (Beware: the 160 page PDF is 7.5 megabytes, and may stress your computer. I use Linux Mint, Firefox for browsing, and xreader for opening the PDF file after I downloaded it with wget. It managed to get the fan of my laptop running at full speed.)

On page 3.4 (60 of 160 of the PDF) the three kinds of brackets are explained, and I quote:

The meaning of enclosing a portion of a general format in special symbols as follows is:
[ a b c ]    at least no occurrences   at most one occurrence

{ a b c }    at least one occurrence   at most one occurrence

a b c    at least one occurrence   at most one occurrence of each

In the pages that follow in the PDF there are examples of commands for defining and manipulating data in a database, described in the above format, with some more details, like underlined words that are required, versus non-underlined optional extra words for clarity. Also, an ellipsis-like symbol … indicates repetition.

I believe it would be possible to rewrite Jorge Stolfi’s word grammar (or parts of it) in this format. Not because Stolfi did it wrong. He did it right, what he has is modern, and best suited for use by algorithms, by computer programs. Humans can also read and interpret that, but that is sometimes a little harder. The old syntax definition, in the Cobol style of the 1970s or earlier, is easier to read for humans – or that is my feeling, anyway: you see in a glance what is allowed and what isn’t, what is optional and what is required.

In my next article I’ll make an attempt. I think a VMS word grammar in that style would be within the intellectual reach of a 15th century monk to set up, and it would be feasible to write large volumes of pseudo text, using that for creating the words.