Still quite some remain (SQSR)

14 April 2021

As described, HTML5 is better than HTML4, and compatible with it. Meanwhile, I have changed and simplified the doctype headers of nearly all my HTML pages accordingly. 1779 of them. Not by hand, of course. They also all have time tags now. And I changed the needlessly complicated
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
to simply
<meta charset="UTF-8">. Neat and clean. That’s how I like it.

Many of the obsolete <a name=…>…</a> have been replaced by an id=…> attribute of the previous tag, or of a new <span>…</span> pair where that was easier. Still quite some remain though. Automated, fault-free conversion isn’t easy. Sometimes the time required for checking everything outweighs the alternative of just typing or copy-pasting what is needed.

Of course I left my earliest experiments with mutual href and name anchors unchanged, for historic purposes.

There were many changes, so the validator came to good use. Only occasionally though, checking everything one at a time just isn’t feasible, because of the sheer number of web pages I now have on my site. As said, almost two thousand. So for a few days, I did some checks randomly a few times a day. Obviously that doesn’t make a dent.

Then one morning, while taking a shower, often a creative and fruitful moment of the day, I wondered: can’t I call the validator from a script, maybe pausing one minute between each URL, so as not to overload the site? Then after a day or two I’d have a complete list of syntax errors to fix. I googled that, but found something else: the program behind the validator is also available as stand-alone, open-source, free software!

It was written in Java, and works on a variety of platforms. I use the stand-alone executable under Linux Mint. Less than a minute of CPU time (half that in throughput time, because the Intel i3-10110U processor has two physical cores) was enough to check everything. Then followed a number of days to fix lots of real errors (computers are dumb and fast, people are slow), and ignore some errors or warnings that I just don’t agree with (e.g. invalid characters in URLs; I say: just copy what is between " and ", and that’s what browsers do), and postpone others, like more of those obsolete <a name=…>…</a>s.

The messages were an accumulation of over 20 years of hand-coding, during which I did sometimes do syntax checks, but not consistently. And some of what used to be cool and right, is now old-fashioned, wrong or stupid – like frames, all but gone on my site.

OK, so I didn’t fix all errors, but many. Still quite some remain!