Encoding details

28 June 2023

Caveat

Although detailed, this is still conjectural, vague and inexact. To make it meaningful or useful in any way, it needs to be elaborated, tried, confirmed, or falsified, and right now I don’t know how to do that yet. But to make it possible, a description is needed. A long journey begins with the first steps. Perhaps there will never be a journey. Time will tell.

Verses

Book, chapter, and verse. That’s what we need. Be concrete, be specific, be precise. If the ‘words’ of the VMS are really encodings of Bible locations, which is what I think they are, then we need more than Book, chapter and verse: within a verse there must be some sort of a count, to arrive at the exact word that is to be encoded.

Moreover, were Bible verses already identified at the time the VMS was probably created, in the early 1400s? No. I quote again from Wikipedia, which used public domain text from a 1913 edition of the Catholic Encyclopedia:

“These were indicated by book and chapter (the division into chapters had recently been made by Stephen Langton) but not by verses, which Robert Estienne would first introduce in 1545. In lieu of verses, Hugo divided each chapter into seven almost equal parts, indicated by the letters of the alphabet, a, b, c, etc.”

Those letters I’ve seen, in Hugo de Saint-Cher’s concordance (more on that later), but I haven’t been able to find an old enough Vulgate Bible yet, where the exact lettered chapter divisions are visible. More modern Vulgates don’t have them. And exactitude is needed, if we ever want to make anything verifiable. A practical difficulty.

The Hebrew Bible did already contain verse divisions. I’ve seen the concordance made by Isaac Nathan ben Kalonymus, but I haven’t quite worked out how it functions exactly. More on that later too.

Grammar

Latin is a heavily inflected and conjugated language. So it is conceivable that a word to be coded does occur in the Bible, but in a different case or tense, etc. Therefore I conjecture that optional encodings existed for case and number of Latin nouns; case, number and gender of Latin adjectives; and for person, number, tense and mode of Latin verb forms.

Hebrew has conjugations too, and there is some noun morphology, although much less complicated than in the case of Latin.

Most encodings, grammatical and locational, would be optional and have default values.

Defaults

In the absence of a Testament indication, the Old Testament is assumed.
If no Bible Book is mentioned, it is Genesis or Matthew.
A missing chapter means the first chapter of the Book.
A missing letter defaults to ‘a’, a missing verse to 1.
Grammatical details are not needed if the word form in the Bible is exactly what is required for the underlying word in the VMS text, that is to be encoded. Where context unambiguously clarifies certain morphological details – for those who know the language well, of course – those may be omitted as well.

Jorge Stolfi found that almost all VMS ‘words’ are structured as a Core, surrounded by zero of more Mantle prefixes and suffixes, in turn enclosed in optional Crust prefixes and suffixes. This structure definition allows for short words, but also very long words, as well as anything in between. This might well correspond to my assumptions about optional encoding elements, where default values apply when elements are missing.

The restrictions on character classes that can occur in various elements, which Stolfi described in his word grammar, make it possible to parse ‘words’ unambiguously, even in the presence of omissions. (What? An omission means something is not there. So how can it have a presence? Well, never mind.)

Line structure

In “The line as functional entity” Currier noted that certain kinds of word are rarely seen at the start of a line, others more often, and yet others are frequent at the end.

My attempt to explain that, is that in principle every word of the underlying text can be encoded from a different Bible Book and chapter, and is then explicitly marked as such. But it is also possible to use the same Book, maybe even the same chapter, for all of the words in a line. Then this info is encoded once per line, as the default value for all the words in that line.

If that is true, lines of this type on average should have shorter words than other lines. This could and should be verified, but I didn’t do it yet.

Spelling alphabet

There is probably also a way to spell out the letters of words in the underlying language, that just happen to not occur in the codebook at all. Perhaps then each letter of that word is spelled out using one special kind of VMS ‘word’.

Any sense in this?