Grep’s result colouring, and RTL scripts

2–

How I discovered the problem

I was using grep to find out more about encodings of Yiddish (in Hebrew script, of course, written left to right), and I got unreadable results. I first thought that was just me, because my Yiddish reading skills are very limited: I spell out words like a six-year-old. But on closer inspection, I found that there were really parts of sentences in the wrong order, and also swapped letters within words.

To get a better understanding of what happened, I created a very simple example.

Isolating the problem

I made a text file with a single line that contains the single Yiddish word azoy, in Hebrew script: אַזױ, spelled alef, patah, zayin, ligature of vav and yod. Then I grepped for occurrences of the oy character, ױ.

By default, grep (I run it under Lubuntu 23.10) colours its results. When I disabled that, everything worked fine:
grep --color=never ױ filewithazoy
correctly found and displayed:
אַזױ
But when I did not disable result colouring, the oy character was displayed correctly in red, but in the wrong order:
ױאַז

Escape sequences

I suppose this is caused by the escape sequences for rendering the colours (or colors, if you will, in American English). They are for SGR, Select Graphic Rendition. They contain an m and a K, as follows:

‭אַז<esc>[01;31m<esc>[Kױ<esc>[m<esc>[K

(I forced the Hebrew characters into left-to-right order for the occasion, by prepending a Unicode character 0x202D, left-to-right override. By <esc> I mean the ASCII escape character, hex 1B or octal 033.)

Apparently in the terminal or in bash, Unicode’s bidirectional algorithm is applied before interpreting the escape sequences, so the presence of Latin characters messes up the order of the Hebrew characters. I think it should be the other way round: render the colours from the escape sequences, and only then apply the bidirectional algorithm on the Hebrew-only result. But that’s probably easier said than done. However, browsers do handle HTML in that manner.

Failed attempts

What I tried, without success:

Nothing worked for me.

I wonder how people in Israel do this? Or those working with Yiddish in New York etc.?

Forums

I posted the question also in Facebook group Linux Commands, here, and in forum Superuser.

No suggestions or solutions so far, 3 June 2024 at 14:15B.


Gnome!

Someone on Superuser suggested to try gnome-terminal. Even though Lubuntu doesn’t use gnome, it does have gnome-terminal as an installable program. So I tested version 3.49.92, which uses VTE 0.74.0, instead of qterminal 1.3.0, and then grep’s highlighting is shown correctly!!! Problem solved, thanks!

Addition 4 June: KDE’s konsole, apart from strange font handling and cursor placement, also does the order in coloured right-to-left grep hits correctly. So the problem seems to be specific to LXQt’s qterminal.

Update 5 June 2024: retested with qterminal 1.4.0 under Lubuntu 24.04: the bug persists.

Addition 10 June 2024: lxterminal 0.4.0, as included in Bunsenlabs Linux version Boron, does not have the bug.


Arabic?

Interesting to see how this is with that other famous language written right to left, Arabic. I tested with a file that contained the name of Cairo in Arabic, القاهرة al-qaahira(t), then grepping for hr, هر. Same result: the highlighting is wrong in qterminal, and right in gnome-terminal.

Not surprising, but good to know.