Two days ago I finally made the improvement I had wanted to make for years. Until then, my visits statistics page showed my most visited pages with the number of visits over an unspecified period. Not very informative, as only comparisons between pages were meaningful, not the visit counts themselves. My new algorithm is:
The implementation details are rather complicated. The standard Unix
utility date
is quite
powerful for doing date and time calculations, although sadly,
it does not spontaneously understand the default date format of
nginx’s logging, which is 20/Feb/2023:09:41:24 +0100
for example. So I specified that format, using the -D
option.
This lead to the following code for Bourne shell compatible shells:
FRSTDATE=`zcat /var/log/nginx/access.log.2.gz | head -n1 | sed -E 's@.+\[(.+) .+\].+@\1@'` LASTDATE=`cat /var/log/nginx/access.log | tail -n1 | sed -E 's@.+\[(.+) .+\].+@\1@'` # Set today's date in case log is empty, just after a rotate if test -z $LASTDATE then LASTDATE=`date "+%d/%b/%Y:%H:%M:%S"` fi FRSTSEC=`date -d $FRSTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"` LASTSEC=`date -d $LASTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"` SECSBETWEEN=`expr $LASTSEC - $FRSTSEC`
The variable SECSBETWEEN, containing the number of seconds, I then use in:
awk -v SECS=$SECSBETWEEN '{printf("%.0f %s\n", $1*7*24*3600/SECS, $2);}'
This assumes a format, transformed from the log entries, which contains
the number of visits, and the URL between [ ], separated by white
space. The idea is that awk
recalculates that number
of visits to what it would be if the logging considered covered
exactly one week.
While writing this article, I find that my solution relies on specifics
of Alpine Linux’s implementation of the date
utility.
The GNU implementation used by Linux Mint (and probably by Debian
and Ubuntu too)
doesn’t have that -D
option.
I want everything on my website, i.e. any installation code, and
anything that it installs, to be compatible with Debian and
derivatives, and with Alpine Linux.
Under the POSIX standard, date
can display the system’s date in various
formats, and it can set it, given appropriate administrator’s rights.
But it cannot convert a given date between formats.
This makes date
unusable for me. I don’t want to rely upon
non-standard extensions that are different between Linux
versions.
Solution: I wrote it myself, in C, using strptime
and
mktime
. In my shell script, I replaced the lines:
FRSTSEC=`date -d $FRSTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`
LASTSEC=`date -d $LASTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`
by:
FRSTSEC=`./fmtd2sec.cgi $FRSTDATE "%d/%b/%Y:%H:%M:%S"`
LASTSEC=`./fmtd2sec.cgi $LASTDATE "%d/%b/%Y:%H:%M:%S"`
The C program can be downloaded for perusal from this link.
Copyright © 2023 by R. Harmsen, all rights reserved.