Simple word indexer (1)

Idea for the whole series: 2 July 2021, text written 8–. Some redaction on the 13th.

Altavista and Google

I still remember the days when Altavista was hot. It ran on an Alpha processor with 6 gigabytes of RAM, which was exceptional back then, because a 386 could only handle 4, or any one segment of memory could only be 4 GB, anyway.

Today you are almost laughed at if your personal laptop has only 4 GB of internal memory. 8 is usual, 16 is better, they say.

For some reason that I never understood, Altavista was overtaken by Google. I liked it, only you never knew when they were going to include your new web page, or an important change to an old one, in their indexes. It could take long, and there was no way to influence that.

Picosearch and Swish

That is why I wanted a local search engine, in addition to the public ones. I found it in PicoSearch. Its indexes were on PicoSearch’s servers, not your own web hosting servers. But you could decide yourself when to refresh the index, or completely build it anew. That was the control I wanted. Unfortunately, PicoSearch stopped its services on 1 July 2014. They probably couldn’t turn the search engine into a business case.

I find that already in 2002 I had Swish running. Swish, later Swish-e, let you build an index on your own server, and update it whenever and as often as you liked, ready for use in a search screen. And it could be used as an HTML syntax checker as well.

In October 2019, on my web server running FreeBSD 11.2, there was a discrepancy be­tween perl versions 5.28 and 5.30, and to resolve it I decided to remove both temporarily, with the intent to reinstall just one version. Strangely, removing perl 5.30 also removed apache24, spamassassin, milter-greylist, a lot of p5-thingies, opendkim, mod_php72, milter-greylist-4.6.2_4, and swish-e. Well, no problem, I thought, all of those can easily be reinstalled using FreeBSD’s package manager pkg.

But swish-e was no longer supported by pkg! I found this installation description. The site swish-e.org was no longer up, which is suspicious. The Wayback Machine (archive.org) still had a copy of versions 2.4.0 (the URL was http://swish-e.org/Download/swish-e-2.4.0.tar.gz), 2.4.3 and 2.4.4. However the installation of all of them failed, with two compiler warnings and an error. I took a brief look, but couldn’t quickly fix it and gave up.

mnoGoSearch and htdig

I needed an alternative local search engine, and found and tried mnoGoSearch. Nice name, it seems to suggest an incentive Go Search, maybe also No Go Search, like a no-go area, and the Russian word много (the software was written by Russians) means ‘much’ or ‘many’. Search and find a lot, was that the idea behind the name?

The description looked promising, and FreeBSD officially supported it. But the site mnogosearch.org no longer existed (October 2019; it does again now), I had to resort to the Way Back Machine. After installing, a message appeared “Now, to use mnoGoSearch you need to create the appropriate *sql database manually”. I don’t like that, in my opinion software should itself install and set up everything that is needed, and should work out of the box.

I eventually found docu in the Huihoo site. There were typos (/usr/local/mnogosearch/etc/ should be /usr/local/etc/mnogosearch), a website lavtech.com was mentioned, which redirected to welcome.mnogo.ru, but that page is completely empty. I did however progress a little, ran indexer -Ecreate to create database tables for sqlite3, but I got an error message that there is no option -E. The manual page too didn’t mention such an option, and also gave no clue as to how to create those database tables. That was the limit. I gave up.

Next I tried htdig (also jokingly called ht://Dig, or htsearch), which works very well. However, it only supports plain ASCII. When trying a Portuguese name or word, like Camões for example, it was converted to camões. But that was not how I usually encoded Portuguese, I used ISO8859-1 (now UTF-8). As a result, htdig didn’t find the occurrences.

Not surprising, because htdig dates from 1995 and hasn’t been maintained since 2004. Just not modern software.

Hyper Estraier

Next