31 May – . An addition the earlier tests.
Normally I don’t pay attention, because it happens automatically and unattended. For a task that is performed maybe two or three times a day, if that, only when necessary, a few seconds more are immaterial.
Yet I happened to notice that building the full Siworin index for my website takes a lot longer, about 4 times as long, on my web server VPS (Virtual Private Server) than on my laptop at home. And I’m often curious about things that don’t matter, wanting to know causes. Hence this investigation.
My laptop runs Bunsenlabs Linux, which is Debian, and the web server runs Alpine Linux. The file systems are also different: ext4 and btrfs. To see if that causes the difference, I took a temporary additional web server under Debian using ext4, with the same cloud provider Virtua, so I could compare it with the existing one.
Since the testing in July 2021, my website has grown. Instead of 1875 HTML files there are now 2360, using 99,825 instead of 78,827 bytes to specify their paths. There are 119,602 unique words, stored in 20,992,292 bytes, not 186,106 unique words in 11,267,657 bytes of storage, for the words and their locations.
I can’t really explain these strange differences, fewer words,
more storage. It’s probably that meanwhile there are more
config options to define what is a word. Most notably
#define MAX_OCCURR 250
is now config option
word_max_occur, which I set to 2 million, so in practice
all words are indexed, even 70568 occurrences of ‘de’
(frequent in Dutch as a definite article, and in Interlingua
and Portuguese meaning ‘of’), 47126 times ‘in’ (frequent in
Dutch, English and Interlingua), the other Dutch definite
article ‘het’, 39255 times, (I now skip a few), 26285
occurrences of the English definite article ‘the’, etc.
Even with so many seemingly redundant words, that are included in the index although they could have been skipped, searching remains reasonably fast.
So testing in 2025 is not comparable to testing in 2021, but the various 2025 tests are comparable. The hardware, filesystem and OS configurations are as follows:
Cnf | OS | Linux kernel | CPU | Clock | Cores | Wrk. mem. | Storage |
---|---|---|---|---|---|---|---|
E | Bunsenlabs Boron (Debian 12) | 6.1.0-37-amd64, Debian 6.1.140-1 | Intel® Core™ i3-10110U | 2100 MHz | 4 cores (2 physical cores, with Intel® Hyper-threading) | 8 GB | SSD, ext4 |
F | Alpine Linux 3.22 | 6.12.31-0-virt | GenuineIntel, Common KVM processor | 2095 MHz | 1 core | 1 GB | SSD, btrfs |
G | Alpine Linux 3.22 | 6.12.31-0-virt | same | 2095 MHz | 4 cores | 4 GB | SSD, btrfs |
H | Alpine Linux 3.22 | 6.12.31-0-virt | same | 2095 MHz | 1 core | 1 GB | SSD, ext4 |
J | Alpine Linux 3.22 | 6.12.31-0-virt | same | 2095 MHz | 4 cores | 4 GB | SSD, ext4 |
K | Debian 12 | 6.1.0-37-amd64, Debian 6.1.140-1 | same | 2593 MHz | 1 core | 1 GB | SSD, ext4 |
L | Debian 12 | 6.12.31-0-virt | same | 2593 MHz | 4 cores | 4 GB | SSD, ext4 |
Below is a table of how long
the steps took in
those situations. All test steps were performed several times in succession,
to take advantage of disk caching.
All times were measured by time (1)
and are expressed in
milliseconds (ms).
Step | Conf. E | Conf. F | Conf. G | Conf. H | Conf. J | Conf. K | Conf. L |
---|---|---|---|---|---|---|---|
1 find and 6 pathoffs | 43 | 87 | 53 | 87 | 13 | 37 | 26 |
2 wordsep | 1848 | 4139 | 4505 | 3865 | 3670 | 3144 | 3675 |
4 sort | 875 | 15,085 | 12,400 | 13,200 | 12,550 | 2300 | 1405 |
5 combine | 884 | 2804 | 2625 | 2,605 | 2400 | 1890 | 1740 |
2, 4 & 5 piped | 3649 | 20,640 | 16,010 | 18,998 | 15,260 | 6235 | 5820 |
..
@? Yet to be completed! @?
Next, we’ll have a look at the word extractor.
Copyright © 2025 by R. Harmsen, all rights reserved.