This post was discussed in LobstersOnce upon a time, a frontend ticket landed on my queue which was not properly mine, but the only other Arabic reader on the team was on leave. It went roughly as follows; a block of mixed-content Arabic prose on the customer-facing dashboard was rendering with a ragged left edge (the rag falls on the left in Arabic, since the lines set out from the right margin; the ticket said "ragged right") when the design team had explicitly specified justified text. Attached were three screenshots from three browsers and a polite note from the product manager observing that the Latin-script version of the same block looked, I quote, "fine."The same six months I had closed three other tickets against the same product, each of which had presented to its filer as the only bug. A customer's name had appeared with its letters unjoined on a printed agreement, the way a sign-painter would have laid them out in 1962, because the PDF library on the receipt server pre-dated the existence of a shaping engine in its language runtime. A search index had been returning empty for accounts the customer service team could see in the database because a 2017 import had encoded twelve thousand names using fossil Unicode codepoints from 1991 instead of regular ones from 1995, and the index, very reasonably, treated the two encodings as different strings, So, that ragged-left ticket was the smallest of the four, HOWEVER, it sat on top of the same iceberg and pointed at the same thing.Here is the disagreement, reproduced live. I used random text, the original had more spacing, I'm too lazy to pick words to maximize the ragging and spacing.PRODUCTION, ANY BROWSERالخط هندسة روحانية ظهرت بآلة جسمانية، وهو لسان اليد ورسول العقل، وسفير الضمير ووحي الفكر، وسلاح المعرفة وأنس الإخوان عند الفرقة. apply the fix the ticket asks forTHE MOCKUP, AS DESIGN APPROVED ITالخـــــط هندســـــة روحانيـــــة ظهــــرتبآلــــة جسمانيــــة، وهــــو لســـان اليــــدورســـــول العقـــــل، وسفيــــر الضميــــرووحـــــي الفكـــــر، وســـــلاح المعــــرفةوأنـــــس الإخـــــوان عنــــد الفــــرقــــة.On the right, the agreed design: both margins flush, every line filled by elongating the strokes inside the words, never the spaces between them. It renders in your browser only because I placed every elongation by hand, a confession I will expand on below. On the left, what production ships. Tick the box to apply the one tool CSS offers, text-align: justify(For these demonstrations this site ships its first webfont ever: Amiri, self-hosted, a hundred and fifty kilobytes of one man's unpaid evenings, redistributed under the OFL. That this is what it takes to show you something your operating system cannot do on its own is, I want to be clear, part of the argument. I think it is a delightful hundred and fifty kilobytes.)It did look fine. I spent about half an hour with it, I walked the rendered DOM, I set text-align: justify in so many different combinations of font-family and direction declarations, and at the end of the exercise I wrote a reply explaining, more or less honestly, that the problem was not a bug in our stylesheet but the state of Arabic typography on the web.The reply took and the closure of the ticket took half an hour or so. The reasons behind it took five hundred years to pile up, and they involve a twice-mutilated vizier, a Qurʾān that vanished for four centuries, a Beirut newspaperman with a deadline, and an Egyptian physician who taught himself font engineering for fun (or that what I imagine about him). Walking through these, ended up to be the most enjoyable couple of weeks in that job, and I want to go through it here too.What the scribes solvedThe history deserves recording because most people outside the small world of Arabic font engineering don't know it, and it is wonderful. Classical Arabic typography, by which I mean the manuscript tradition that the early printers of Istanbul and Bulaq spent their careers chasing, justifies a line of text without stretching the spaces between words at all. Stretched spaces are the Latin convention, and in Arabic they produce an effect the scribes would have found simply ugly. Instead the scribe extends the letterforms themselves along the baseline, using what is called taṭwīl or, in the modern technical vocabulary, kashida: the connecting strokes between certain pairs of letters can be lengthened, sometimes lavishly, to carry a line out to the margin. A well-set page of Naskh from the seventeenth century has every line flush at both margins, and the result is the dense, regular weave that anyone who has spent time with a good manuscript Qurʾān will recognise on sight.Fig. 1. A Qurʾān folio, fourteenth century, now in the Metropolitan Museum of Art. Run your eye down the left edge: every line lands flush, and not one word-space was stretched to get it there. The justification lives inside the words. (Public domain, via Wikimedia Commons.)And this was not improvisation but a system, with a paper trail. The system was written down by Ibn Muqla, Abbasid vizier and chief calligrapher, who served three caliphs in succession and was imprisoned by two of them; the third had his right hand amputated on a charge of treasonous correspondence, and Ibn Muqla then kept writing for the next several months by lashing a reed pen to the stump of his wrist, and was rewarded for what he wrote by having his tongue cut out, and died in prison around the year 940. His body was buried three times in three different places, his daughter moving it after each interment to keep the grave out of police hands. The system he wrote down outlasted everybody who hurt him by a thousand years. It is called al-khaṭṭ al-mansūb, the proportional script; every letterform measured in rhombic dots of the reed nib, every curve a defined arc of a defined circle, the alif a fixed number of dots high and anything else derived from the alif. Within that system the elongation is a drawn stroke with its own rules, which letter pairs accept it, how the curve swells and tapers, how many elongations a line may carry, where they may sit. The scribes also justified by choosing different shapes, because most letters have alternate forms of different widths, and a skilled hand selects among them as the margin approaches. Justification, in this tradition, is not a spacing problem rather a shaping problem.The tradition Ibn Muqla started did not stay with him; it was refined, in writing, by named human beings over the following six hundred years. Ibn al-Bawwāb in Baghdad, around the year 1022, smoothed out the proportions and produced the manuscript that defined Naskh for the rest of the millennium; a single Qurʾān in his hand survives in the Chester Beatty Library in Dublin, and you can date the Persian, Ottoman, and Mamluk traditions by how closely they follow it. Yāqūt al-Mustaʿṣimī, who survived the Mongol sack of Baghdad in 1258 by climbing a minaret and continuing to write, codified what later scholars called the Six Pens, the canonical hands of Naskh, Thuluth, Muḥaqqaq, Rayḥān, Tawqīʿ, Riqāʿ, each with its own metrics, each with its own justification grammar. Then the Persian scribes invented Nastaʿlīq in the fourteenth century, a hanging script that justifies by sloping the baseline downward at the end of each phrase, which is to ordinary justification roughly what a vertical garden is to a lawn. The Ottomans developed Dīwānī for the chancery and a tightly knotted Dīwānī Jalī for the sultanic seal, both of which fill space by interleaving letters at heights ordinary baselines never visit. All of these are the same alphabet of twenty-eight letters; all of them have their own rules about which letters accept the kashida, which never do, and how the line breathes.Latin typesetting never needed any of this, because Latin letters do not hold hands. Arabic letters do, and the web, in 2026, looks at them holding hands and stretches the air between the words anyway. So now you know what the mockup card at the top of the page was doing: it was faking a page of this manuscript tradition in HTML, every line carried to the measure by the strokes and not the spaces. The fakery, since I promised a confession, is U+0640 TATWEEL characters that I placed and sized by hand.Four shapes for every letterTo understand why every machine since Gutenberg has wrestled this script and mostly lost, you need one structural fact: Arabic is cursive always. There is no print-versus-handwriting distinction, no block letters. The letters connect in stone inscriptions, in manuscripts, in metal, on screens. Each letter therefore changes shape depending on its neighbours (an isolated form, an initial, a medial, a final), and six letters refuse to connect forward at all, which breaks words into joined clusters and gives the script its rhythm. The shapes are not costumes over some underlying "real" letter. The positional variation is the letter.And the alphabet is bigger than Arabic the language. Persian extends it with four letters Arabic does not have (پ pe, چ che, ژ zhe, گ gaf) and uses two of the existing letters in subtly different forms (ی for the final yāʾ, ک for kaf). Urdu adds an aspirated do-chashmī he (ھ), a retroflex set (ٹ ڈ ڑ), and a hanging ye barree (ے), and writes most of its everyday text in Nastaʿlīq, which a Naskh-shaped font will produce as a phonetically correct but visually unrecognisable approximation. Sindhi has more again. Pashto, Kurdish, Uyghur, Kashmiri, and Punjabi each take the alphabet, add what their phonology requires, and ship. Any font that calls itself "Arabic" without consulting the Persian and Urdu communities will produce, for hundreds of millions of readers in Iran and South Asia, text that is technically rendered but functionally wrong: the kaf has the wrong terminal, the heh fuses where it shouldn't, the digits are from the wrong belt. The Noto Sans Arabic family ships separate sub-fonts to cover these (NotoNaskhArabic, NotoNastaliqUrdu, NotoSansArabicUI), and OS font fallback chains usually get it right. Usually.stored codepointisolatedinitialmedialfinalU+0639 ʿAYNععــعــعU+0647 HEHههــهــهOne codepoint, four shapes, chosen at render time by the shaping engine. The medial heh and the isolated heh are, to an untrained eye, different letters; I have watched students of Arabic meet the medial heh in week three and file a complaint with management. A Latin font that ships 26 lowercase shapes needs no opinion about any of this. An Arabic font is wrong unless it has opinions about all of it.The arrangement we eventually settled on, after decades of wrong answers, is this: the encoding stores the abstract letter, and the font supplies the shapes. Unicode gives you one codepoint for ʿayn; the font carries the four positional glyphs; a shaping engine applies the OpenType features (isol, init, medi, fina, plus rlig for the ligatures the script requires, plus mark and mkmk for stacking the vowel signs) at render time. An Arabic font is a small program. The text you store is its input, not its output. The word is performed fresh every time you look at it, like music from a score.The cleanest way to feel this is to assemble a word one letter at a time and watch every prior letter renegotiate its shape as the next one arrives:click letters to add them to the word, in the order they appear in the codepoint stream:STORED CODEPOINTS—RENDERED BY THE SHAPING ENGINE—Try م, then ح, then م, then د: build the name Muḥammad. The first م drops into its initial form the moment you add the ح, the ح goes medial when the next م arrives, and so on through to the د, one of the six non-joining letters, which interrupts the flow and forces what would have been the third م into a final form. Four codepoints in storage, one continuous stroke on screen. None of it happens without a shaping engine; a PDF generator that lacks one will render the same four codepoints as four disconnected isolated forms.The wrong answers are still in the standard, fossilised, and they make excellent souvenirs. Before shaping engines existed, the 8-bit code pages of the DOS and early Windows era encoded the shapes themselves: a separate character for initial ʿayn, medial ʿayn, and so on. Unicode, which promised round-trip compatibility with anything else, had to swallow those sets whole, and they live on at U+FB50 through U+FEFF under the name Arabic Presentation Forms: several hundred codepoints that no new document should ever contain and that PDF text extractors merrily emit to this day, which is one of the reasons searching an Arabic PDF so often fails in silence. The haystack is encoded as shapes and your needle is encoded as letters. My favourite resident of the block, and one of my favourite characters in all of Unicode, is U+FDFD, ﷽: four-word invocation, bismillāh ar-raḥmān ar-raḥīm, as a single codepoint. A monument from the era when rendering was baked into the encoding because nobody trusted the renderer to do anything, preserved forever, like a fly in amber that recites.This bites because the two encodings render identically and compare differently. The customer search bug I mentioned at the top of this article was, specifically, this:a small customer database. Some names were entered through a modern Arabic keyboard. Others were migrated in 2017 from a legacy system that stored them in Arabic Presentation Forms. Look identical, don't they?NAME (as rendered)ENCODING IN STORAGEACCOUNTمحمد عليmodern UnicodeEGP-9341-0021ﻣﺤﻤﺪ ﻋﻠﻲpresentation formsEGP-2014-7732سارة أحمدmodern UnicodeEGP-9341-0044ﺳﺎﺭﺓ ﺃﺣﻤﺪpresentation formsEGP-2014-8810search by name: apply NFKC normalisationType a name into the search. Or click the seed button below to paste a query that exactly matches one of the records, the way a customer-service agent might copy the name from an SMS. Without normalisation, you get the records that share your query's encoding and miss the others. Tick the box and the search runs after NFKC, which collapses the presentation forms back to their abstract letters. The fix is one Unicode call. It took a quarter to find because the bug presents as "customer not in system", and customer-service tickets do not arrive with codepoint dumps attached.And if you want to know what the world looks like when software skips all of this, the shaping engine, the bidi algorithm, the whole apparatus, you do not have to imagine it, because an enormous amount of software still skips all of it: render it the way software outside the text stack doesمرحبا بالعالم، هذا نص عربيThe text says "hello, world, this is Arabic text." Tick the box for the version every Arabic reader has met in the wild: on a shop sign, a boarding pass, a watermark, an old film title. Every letter drops into its isolated form and the line is laid out left to right, backwards. This is what a program produces when it draws characters one at a time and never consults a shaping engine: old Photoshop did it, matplotlib still does it out of the box, many of the PDF generators on npm do it, receipt printers do it with conviction. The standard Python workaround, plus python-bidi, fixes it by pre-baking the shaped forms into the string using that fossil block from the paragraph above.Three sets of digits, one continuous beltThe numerals deserve their own room. Every Arabic-rendering project I have worked on has tripped on them, and most of those projects invented a private vocabulary for what went wrong instead of asking why. Most readers of this article have only ever met one set of digits and are about to meet three.The glyphs the world calls "Arabic numerals", 0 through 9, are not in fact what most Arabic readers use day to day. Egypt, Sudan, the Levant, Iraq, and the Gulf use what Unicode files under ARABIC-INDIC DIGITS (٠١٢٣٤٥٦٧٨٩, U+0660–U+0669), which look nothing like the Latin glyphs and ship in any serious Arabic font as a separate set. The Maghreb (Morocco, Algeria, Tunisia, often Libya) uses the Latin glyphs and has done so since the colonial period; an Arabic newspaper in Casablanca and an Arabic newspaper in Cairo will print today's date in two visually different scripts and consider it unremarkable. Iran, Afghanistan, and Pakistan use a third set, the EXTENDED ARABIC-INDIC DIGITS (۰۱۲۳۴۵۶۷۸۹, U+06F0–U+06F9), four of whose glyphs (4, 5, 6, 7) differ visibly from the Arabic-Indic set despite encoding the same numbers. Any banking platform that operates from Rabat to Karachi will, at some point, render the same balance three ways:render the digits as: Western (0–9) Arabic-Indic (٠–٩) Extended (۰–۹)رصيد الحساب 12,345.67 جنيه، تاريخ آخر معاملة 2026-06-08، رقم الحساب 4071-9923."Account balance 12,345.67 pounds, last transaction date 2026-06-08, account number 4071-9923." Note the trailing period: a strong-LTR run sits inside the RTL paragraph for the numbers, and the punctuation slides around it depending on its neighbours, which the bidi section below is about.The rendering choice is the easy half. The bidirectional behaviour is where the platform starts to creak, because digits are not strong characters in the algorithm. They are weak, neither strongly left-to-right like a Latin letter nor strongly right-to-left like an Arabic one, and what they do depends on whoever stood next to them most recently. The relevant rule, W2 of UAX #9, reclassifies a digit as an ARABIC NUMBER if any of the previous strong characters in the paragraph were Arabic letters, and as a EUROPEAN NUMBER otherwise. Both render their internal digits left-to-right, which is correct: numbers everywhere on Earth are read most-significant-first. But the punctuation between digits behaves differently across the two classes. A hyphen between European numbers stays glued. A hyphen between Arabic numbers floats neutral and gets reclassified again by the rules for neutrals, which look at the strong context, which is right-to-left, and the two number runs swap places around the hyphen. That is how a phone number stored as "010-1234-5678" arrives on screen as "5678-1234-010", per spec, in every browser, identically wrong. add an Arabic word before the number+20-100-1234-5678A phone number, stored exactly as written. Tick the box to prepend the Arabic word رقم ("number"). Nothing about the digits changes in storage; W2 simply re-decides what kind of digits they are now that an Arabic letter has appeared earlier in the paragraph, and the hyphens, deprived of their European-number context, hand the runs back to the surrounding right-to-left frame. The only fix the platform offers is to wrap the number in ‎ or.The third twist, and the one that most directly costs money, is that the decimal mark and thousands separator have local conventions too. The Arabic world uses U+066B ARABIC DECIMAL SEPARATOR and U+066C ARABIC THOUSANDS SEPARATOR (`٫` and `٬`), which look like a comma and an apostrophe but are different codepoints with different bidi properties. A price formatted by `Intl.NumberFormat('ar-EG')` in modern Node will use them. A price formatted by an older library, or by a backend in a language whose locale support stops at French, will use ASCII `.` and `,`. Both render. Both look almost the same. Only one of them sorts and parses correctly in the next system downstream, and you find out which one when reconciliation breaks on a Sunday morning.Five centuries of workaroundsPrint and the Arabic script met badly, and that meeting set the pattern for almost everything since: when the machine cannot do the script, simplify the script, ship it, and call it progress.The first book printed in movable Arabic type was a book of hours, Kitāb Ṣalāt al-Sawāʿī, produced in 1514 in Fano, in the Papal States, by the Venetian printer Gregorio de' Gregori. It was set by craftsmen who could not read a word of what they were setting, and you can tell: letters detach at the joints, dots drift away from their letters, final forms turn up in the middle of words.Fig. 2. A page of the Kitāb Ṣalāt al-Sawāʿī, Fano, 1514: the first book ever printed in movable Arabic type. (Public domain, via Wikimedia Commons.)Two decades later the Paganini press in Venice printed the first Qurʾān in movable type, a commercial venture aimed at the Ottoman market, and it failed so completely (typographic errors compounded by textual ones, in the one book whose point is textual fidelity) that every copy vanished and scholars spent four centuries politely doubting the edition had ever existed. Then in 1987 a single copy surfaced in the library of a Venetian friary, where it had been sitting the whole time.The Ottoman side of the story is usually told in one sentence, "the sultans banned printing," and the sentence is doing suspicious amounts of work. The standard account has Bayezid II prohibiting printing in Arabic characters in 1485 and Selim I renewing the ban in 1515 on pain of death. The inconvenient detail, which the historian Kathryn Schwartz laid out in 2017, is that no text of either edict survives, and the story traces back to the reports of European travellers. Which does not mean it is false; it means the favourite explanation for why the Islamic world "missed" printing rests on evidence that would not survive a code review. What is actually documented is that the first Ottoman Muslim press, İbrahim Müteferrika's in Istanbul, opened in 1727, and that the deeper resistance was professional and aesthetic rather than theological: an empire employing tens of thousands of calligraphers in a refined, thousand-year-old craft looked at Fano-quality output and saw, quite reasonably, a downgrade. They were the only people in this story with working quality assurance.The press that finally did the script justice was the Bulaq Press, founded in Cairo by Muhammad Ali in 1820 and later renamed al-Maṭbaʿa al-Amīriyya, the state press. Doing it justice was gloriously expensive. Where a Latin fount needs somewhere around a hundred sorts, a serious naskh fount needed many hundreds: positional forms, ligatures, vowel marks, every one a separately cut piece of metal, and a compositor who could navigate that case fast enough to keep his job. The summit of the tradition is the 1924 Cairo Qurʾān, set at the Amiria Press, which standardised the text for the twentieth century and proved that metal could, with enough sorts and enough patience, walk right up to the manuscript page and look it in the eye.Fig. 3. A page of the 1924 Cairo Qurʾān, set in metal at the Amiria Press, whose typeface the Amiri font revives. Look at the elongated strokes carrying the justification, four hundred and ten years after Fano. Metal got there in the end; it just needed a government behind it. (Public domain, via Wikimedia Commons.)Then the newspapers arrived, and the economics ran the other way. A Linotype machine's magazine had ninety channels; the script, honestly counted, needed several times that. So in the late 1950s Kamel Mrowa, publisher of the Beirut daily al-Hayat, worked with Linotype on the obvious surgery: merge the initial form into the medial, the final into the isolated, drop the ligatures, and the whole script collapses to two shapes per letter and fits the machine. They called it Simplified Arabic, and it conquered the world's Arabic newsrooms inside a generation, because it was cheap and fast and the alternative was not being a daily newspaper. The typewriter performed the same surgery for the same reasons.Laid end to end, the eleven centuries that follow Ibn Muqla read like nothing so much as a slow, badly-maintained changelog:940Ibn Muqla finishes writing down al-khaṭṭ al-mansūb. (See above for what happened to him while writing it.) The system runs without a major patch for the next thousand years.1001Ibn al-Bawwāb, in Baghdad, finishes the Qurʾān manuscript that becomes the reference implementation for Naskh. It sits today in the Chester Beatty Library in Dublin, where you can book an appointment to see it.1258Mongols sack Baghdad. The libraries burn. Yāqūt al-Mustaʿṣimī, the chief court calligrapher, survives by hiding in a minaret with his pens. He lives another forty years and refines the canonical hands his students eventually formalise as the Six Pens.1485Bayezid II is reported to ban printing in Arabic on pain of death. Reported; see above for how thin the sourcing is. The bug report has no repro and the issue has been open for 541 years.1514First book printed in movable Arabic type: Kitāb Ṣalāt al-Sawāʿī, a book of hours, set in Fano by men who had never learned the alphabet they were casting. Every joint a small accident. Shipped.1537Paganini Qurʾān, Venice. Riddled with errors of every kind. Recalled by reality, lost for four and a half centuries, rediscovered in 1987 behind other books in a Venetian friary.1727İbrahim Müteferrika, a Hungarian convert to Islam, opens the first Ottoman Muslim press in Istanbul. Two centuries behind the rest of Europe. He prints seventeen books in his lifetime, all carefully secular (atlases, dictionaries, histories) because he is not permitted to print religious texts. The empire of tens of thousands of calligraphers tolerates him at arm's length.1820Muhammad Ali of Egypt founds the Bulaq Press in Cairo. State-funded, no expense spared. Hundreds of sorts per fount: positional forms, ligatures, vowel marks, every one a separately cut piece of metal. For the first time in human history, a government takes Arabic typography seriously.1924The Cairo Qurʾān comes off the metal type at Bulaq's successor, the Amiria Press, after twelve years in production under a committee of Al-Azhar scholars. It standardises the text and the typography for the twentieth century. Eighty-seven years later, an Egyptian doctor named Khaled Hosny revives its typeface as a free font and calls it Amiri after the press.1958Kamel Mrowa, publisher of al-Hayat in Beirut, has a deadline. He asks Linotype: can we make Arabic fit a ninety-channel magazine? Linotype says yes if you cut the script in half. He says yes. The result is "Simplified Arabic": initial fused into medial, final into isolated, ligatures dropped. It conquers the Arab newsroom in a generation. Mrowa is assassinated at his desk eight years later, by an unrelated faction, in an unrelated dispute.1984Sakhr (a Kuwait-and-Saudi computer venture) ships its first Arabic MSX home computer, Arabic in ROM. A generation of Arab children types on it; more on this below.1985Thomas Milo and Mirjam Somers found DecoType in Amsterdam and start designing shaping engines that respect the script's actual grammar. They are about twenty years ahead of everyone. They self-fund the whole research programme.1991Unicode 1.0. Letters in, shapes out. The right architecture finally has a standards document. The same release ships UAX #9, the bidirectional algorithm.1994InPage 1.0 ships out of Karachi, written by a small team at Concept Software. Before that afternoon, Urdu newspapers were lettered by hand: a guild of calligraphers called katibs wrote each daily edition with a reed pen, the pages were photographed, the photographs were printed. Daily Jang switches over to InPage and stops paying the katibs. A profession ends in about six months.1995Unicode adds the Arabic Presentation Forms blocks for round-trip compatibility with 8-bit code pages. Nobody should produce these codepoints in new text. Therefore every PDF text-extractor produces them by default.1997OpenType 1.0. init, medi, fina, rlig, mark, mkmk, jstf: the right hooks for an Arabic font finally exist in a font format that ships everywhere. Twenty-nine years later the jstf table sits unread by engines and unshipped by foundries.2000Internet Explorer 5.5 implements text-justify: kashida. For one brief, weird browser-quarter Microsoft is the only software vendor on earth that can justify Arabic correctly on a screen. Nobody copies them, and the value is eventually deleted from the spec for lack of implementations.2006DecoType ships its Tasmeem engine inside InDesign Middle East Edition. It is the existence proof: the problem is solved, retail, on a laptop. Twenty years later no browser engineer has wandered over to look.2011Khaled Hosny releases the Amiri font under the OFL. It is, in 2026, still the best free Arabic font there is, IMHO.2012HarfBuzz lands in Chrome and Android. (The name: ḥarf, "letter", plus a transliteration joke.) Correct shaping becomes the default experience of reading Arabic on a screen for the first time anywhere. Most readers do not notice the upgrade because they have nothing to compare it to.2012Unicode 6.1 adds the Arabic Mathematical Alphabetic Symbols block (U+1EE00–U+1EEFF): 143 codepoints for mathematical Arabic, championed for years by the Moroccan mathematician Azzeddine Lazrek. Almost nobody knows it exists. Approximately two fonts on earth render it.2015W3C charters the Arabic Layout Requirements task force. The CSSWG opens its issue on Arabic justification. Both are still open in 2026. The documents the task force produced are world-class. Their effect on shipped browsers is zero.2022Amiri 1.0.2026Won't Fix.The kashida the web cannot drawBack to the ticket, then. The early drafts of the CSS Text Module Level 3 specification did list kashida as a value of text-justify, and Internet Explorer, of all things, implemented it: version 5.5, in the year 2000, complete with a text-kashida-space property for tuning the ratio of elongation to word-spacing. The results were surprisingly decent for the era. Then the value was quietly dropped from the specification on the reasonable and perfectly circular grounds that only one browser had ever implemented it. No modern browser implements it. Chrome's text-align: justify on Arabic falls back to inter-word spacing, with the rivers you opened yourself at the top of this page. Firefox does the same. Safari does the same. The CSS Working Group has had an open issue on Arabic justification since at least 2015, the same year the W3C chartered a task force to document Arabic layout requirements. The documents that effort produced are very good. Browsers mostly shrugged.The reason no browser ships it is structural. Good Latin justification already does more than pour slack into the gaps as it hyphenates, and a Knuth-Plass-style layout engine chooses break points across the whole paragraph at once, weighing each candidate line against how far its spaces would have to stretch. But in Latin the places you can break and the places you can stretch are the same places, the inter-word spaces, so the engine choosing where to end a line is choosing from the very set of positions that will later absorb the slack. Arabic pulls those two sets apart. You still break between words, but you stretch inside them, at the kashida joins, and a line's capacity to stretch depends on the letters of the words that happened to land on it rather than on how many words it holds. A line can offer many break points and almost no stretch, or the reverse.That capacity is also bounded as a line dense with joinable pairs can stretch a long way and stay beautiful, a line of mostly non-joining letters can barely stretch at all. So you cannot simply break the paragraph and fill each line afterward and trust them all to reach the margin gracefully; some lines will not be able to, and the only fix is to reconsider the breaks so the elongation each line needs stays within what that particular line can supply. The break points and the elongation have to be chosen together, against a cost function that depends on the actual glyphs, which is the part Latin never had to do. OpenType has actually had a mechanism for the font's side of this negotiation since the nineties, the jstf table, in which a font may declare its own justification priorities, and the state of that mechanism after thirty years is a perfect little standoff, virtually no shaping engine reads it, so virtually no foundry ships it, so no engine acquires a reason to start. Nobody has prioritised breaking the standoff, because the users affected by it do not, as a population, contain any advertisers.None of this is exotic outside the browser. Microsoft Word has shipped a kashida justification setting since the late nineties, crude and straight-barred but present. InDesign's Middle East edition does the job properly. The awkward bit is that the renderers that cannot stretch a letter are the ones everyone now reads everything in.So people hack it, and the standard hack is the one the mockup card at the top of this page runs on: insert U+0640 TATWEEL characters into the text itself. As a publishing technique this sits somewhere between kludge and vandalism, and I say that with affection, having just done it. The tatweel is content. It changes the string, so search stops matching; copy-paste carries the padding along; screen readers do what screen readers do with garbage input; reflow the column and every elongation is now in the wrong place. The stroke it draws is also the typewriter bar, placed wherever the author guessed rather than where the script's rules allow. The maddening part is that proper Arabic justification is not an unsolved research problem. Thomas Milo's DecoType in Amsterdam built an engine around the script's own grammar decades ago; it shipped to the public as Tasmeem, inside the Middle East edition of InDesign, and it sets Naskh that calligraphers will sign off on.The ligature swampThe ligature situation is its own swamp, and a lively one. OpenType sorts ligatures into castes: rlig for the required ones, without which the script is simply broken (the lām-alif is the famous case; writing those two letters unfused is not ugly, it is illiterate), then liga for the standard ones, on by default, and dlig for the discretionary ones, off by default. Naskh done properly wants a great many of the middle and upper castes: the allāh ligature that fuses four letters and a shadda into a single glyph, the stacked lām-mīm combinations, dozens of vertical pairings that pull the line's weave tight. Most free Arabic fonts ship the required set and nothing else, which produces text that is correct in the same sense that a phonebook is literature. required ligatures (rlig/liga) disconnect the letters (insert U+200C)لمحمد ولإبراهيم، لا والله لا أكتمكم حديثاًSet in Amiri. Uncheck the first box and you turn off the ligatures the script requires; no sane stylesheet would, but it is a rare chance to watch the shaping engine's quiet labour become visible, the lām-alif falls apart and the allāh ligature loses its stacked marks. The second box inserts U+200C ZERO WIDTH NON-JOINER between every pair of letters, forcing each into its isolated form. The stored letters are unchanged, and what renders is every letter set side by side like sorts in a compositor's case, Fano, 1514, faithfully reproduced with one invisible Unicode character.(If you are in Safari, the first box will do nothing: WebKit refuses to let CSS switch off required ligatures. Hell!)The one great exception is Amiri, the Naskh face that Khaled Hosny, an Egyptian doctor by training who taught himself OpenType tooling over the course of about a decade, built and released under the SIL Open Font License in 2011 and has polished continuously since. The name is the lineage: Amiri revives the typeface of al-Maṭbaʿa al-Amīriyya, the Bulaq Press face that set the 1924 Cairo Qurʾān, which means the best free Arabic font of the digital era is a one-man reconstruction of the best government-funded font of the metal era, and I never get tired of saying that sentence. And it is engineered, not merely drawn. The required ligatures are done with care; the 1.0 rewrite, in 2022, reimplemented the allāh ligature to be more cautious about when it fires. The mark stacking holds up under fully vowelled text. And since that rewrite the font carries a curvilinear kashida: feed it elongations and it substitutes graded, swelling curved strokes, in four sizes, the way the pen would. Scroll back to the mockup card at the top of the page; those curves are Amiri's own work, performed live in your browser. If you are reading an Arabic text rendered well on the open web in 2026, there is a respectable chance you are reading Amiri. The rest of the ecosystem (Scheherazade New from SIL International, Reem Kufi also by Hosny, the various Noto Arabic faces Google commissioned) fills in around it.Since I bragged about the mark stacking a moment ago, here it is earning its keep, along with the cheapest way to ruin it that the web platform offers: apply the card component (line-height: 1; overflow: hidden)نَصٌّ حَكِيمٌ لَهُ سِرٌّ قَاطِعٌ وَذُو شَأْنٍ عَظِيمٍA fully vowelled line, set in Amiri with the mark-stacking machinery at work; the text is the first half of the classic Arabic pangram, "a wise text with a decisive secret and great significance." Tick the box and a perfectly ordinary card component, the kind every design system ships by the dozen, beheads it, the line box tightens, the overflow clips, and the vowels fall off the top. Nobody files a ticket when this happens, because the text stays legible the way "th qck brwn fx" stays legible. Readers cope, the platform never hears about it, and I have fixed this bug at three companies so far.Bidi, or the cursor that liesThen comes bidi, my favourite swamp of all. Mixed-content Arabic (a paragraph of Arabic prose containing a version number, an English identifier, a URL, an inline switch into French) invokes the Unicode Bidirectional Algorithm, defined in UAX #9, in the standard since 1991 and one of the most complicated specifications Unicode publishes. The core idea is that characters carry directional personalities. Arabic letters are strongly right-to-left. Latin letters are strongly left-to-right. Digits are weak: they travel with their context. Spaces and punctuation are neutral and take their direction from whoever is standing next to them, like guests at a wedding who know nobody. The algorithm resolves all of this into runs, reorders the runs for display, and the text on screen ends up in a different order from the text in memory.in memory (logical order, first byte on the left):النسخة v2.4 جاهزةجاهزة v2.4 النسخةon screen (visual order, line read from the right):"The version v2.4 is ready." Three runs. The outer two flip into the right-to-left frame, the Latin run stays internally left-to-right, and the first word in memory renders at the opposite end of the line from the last. Hover a box and watch its crossing. Every caret movement, mouse click, and selection in this little sentence has to be translated between the two orders, and the translation has edge cases at every run boundary.The algorithm is not magic, and a reader who walks through it once is permanently inoculated against the surprise of its output. It runs in stages. First it assigns every character a bidirectional class drawn from a fixed list of about twenty (L, R, AL, EN, AN, ES, ET, ON, BN, and so on). Then it resolves the weak classes, the digits and the punctuation that travels with them, in seven sub-stages numbered W1 through W7. Then it resolves the neutrals, the spaces and the common punctuation that has no opinion of its own, in two more (N1 and N2). Then it assigns each character an embedding level, an integer where even means left-to-right and odd means right-to-left. Then it reverses the runs of equal level so the line displays in the order a human eye expects. The whole apparatus is forty pages of specification doing work no Latin paragraph has ever needed done, and you can step through it on a sample sentence here:algorithm stage: Every text-input control in every major framework implements that translation, in detail, slightly differently, and this is where the fun begins. The visible failure mode is the cursor. At a run boundary there are two legitimate places for the caret, the logical position and the visual one, and the input handler has to pick. Chrome picks differently from Firefox, which picks differently from Qt, which picks differently from whatever Outlook is this year. Old Mac OS used to draw two carets at boundary positions, one for each direction, which was perfectly honest and which everyone abandoned because users found the honesty alarming. I think about that often: the one interface that told the truth was retired for telling it. You can feel the seams yourself, right here:النسخة v2.4 من المحرك جاهزة للاختبار منذ 3 أيامEditable; go on, click in. Place the caret at the start of the line, hold the Left arrow, and enjoy the moment it reaches the version number, where it will jump, double back, or sail through depending on your browser. The readout below is the caret's logical index into the stored string: arrow keys that move steadily across the screen produce indices that leap around like a scratched record, because screen order and storage order are different orders. Caret logical index: –I have watched senior engineers, fluent in both Arabic and English, give up on writing a long email in Outlook on a Wednesday afternoon because the cursor would not behave, and switch to Arabic-only or English-only because the cognitive cost of fighting the editor exceeded the cost of monolingual phrasing. Actually I remember very well suffering this while using Facebook for the first time in my life, and I could not register; I was very slow typer that when I reached the moment the cursor does this weird thing, I would just stare at it and never progress.This is the ordinary experience of writing mixed Arabic-English text in 2026, in every major editor, email client, and chat application I know of. The pettier cousins are everywhere too, and I collect them: a range like 10–20 silently reading as twenty-to-ten, because digits are weak and the dash is neutral; a trailing exclamation mark teleporting to the far end of the line; a password, toggled visible, displaying in an order that does not match what was typed. None of these are anyone's bug, exactly.The range flip deserves to be seen live, because unlike the cursor it is not even an inconsistency; it is what the spec orders, everywhere:Stored, in this exact order: the word الصفحات ("the pages"), a space, then 1 0 - 2 0. Rendered by your browser, live:الصفحات 10-20Pages twenty to ten, by the mechanism from the numerals section: W2 strips the digits of their European context, the hyphen goes neutral, the runs swap. The cure is a single invisible character, U+200E LEFT-TO-RIGHT MARK, slipped in before the first digit.الصفحات 10-20Pages ten to twenty. Multilingual typesetters carry a small pocketful of these invisible characters everywhere they go, like exorcists.Some thanks, and some overdue acknowledgmentsKhaled.Khaled Hosny, whom I hope to meet someday (he was recently nearby in Cairo, and I somehow missed the talk), wrote Amiri. He also wrote hb-shape, the HarfBuzz command-line tool. He co-maintains HarfBuzz, maintains several other free Arabic fonts, works on the Arabic Unicode block, has filed dozens of CLDR bugs, and is one of the reasons your operating system formats Arabic dates correctly. I respect him a lot and I wish to meet him soon.Behdad Esfahbod, who wrote much of HarfBuzz before Hosny, is Iranian-Canadian. In 2017 he was detained for ten hours at the US border on suspicion of being Iranian, which he was. He was working at Google at the time. The shaping engine running in your browser at this moment, which paints every Arabic letter you see correctly, was for years carried by an engineer the US government considered a security risk.Behdad.Brill, the four-hundred-year-old Dutch academic publisher, spent the better part of a decade and a reported $750,000 commissioning the Brill typeface from John Hudson, because no existing font covered all the transliteration characters their Semitic-studies catalogue required. They released it free for non-commercial use in 2011, the same year as Amiri. Independently, on the same continent, in the same calendar year, two of the best digital faces for Arabic-adjacent typography were released to the public for free, because no commercial path could possibly justify either of them.The first computer ever to display Arabic in ROM, the Sakhr AX-170 of around 1984 (a Saudi-Kuwaiti MSX), shipped with a children's typing tutor and a built-in BASIC dialect that accepted Arabic identifiers, written right-to-left, in source code. You could write متغير = 5 and the interpreter parsed it. The language called itself Sakhr BASIC. Some of the children using it became, twenty years later, the engineers fighting bidi bugs in everyone else's software.Everything in this story that actually works was paid for by almost nobody. HarfBuzz, the shaping engine that finally made correct Arabic rendering routine in free software (it is applying init and medi and rlig in your browser right now, as you read this) was carried for years substantially by Behdad Esfahbod, and its co-maintainer today is the same Khaled Hosny who built Amiri, which gives you some sense of how few shoulders this particular sky rests on. Amiri itself: evenings and weekends, a physician teaching himself font engineering to rebuild what the Bulaq Press had and the digital era misplaced. Scheherazade and friends: SIL International, missionary linguists who needed the fonts for translation work and gave them to everyone. The GNU Unifont coverage of the Presentation Forms blocks that nobody should use but everybody must render: volunteers. The Noto Arabic faces: a Google i18n budget that has always been, by that company's standards, a rounding error. The W3C Arabic layout documents: volunteers again. The whole stack stands on a small number of people who decided, against the grain of every incentive their professional environment offered them, that the script a few hundred million people read deserved infrastructure, and then sat down and built it for free.No commercial actor funded the unglamorous parts, because no quarterly report has a line item for "Arabic users can now justify a paragraph." The browser vendors took HarfBuzz when it was free and finished, and have contributed approximately nothing toward the justification work that would let the scribes' system finally run on a screen; the standoff around the jstf table enters its fourth decade, and the CSSWG issue enters its second, both in excellent health. The user-facing result, the ragged left margin on the customer dashboard I closed this morning as Won't Fix, is what the absence of that funding looks like when you multiply it across every browser and editor and renderer in the world. The scribes solved this in the tenth century and the volunteers have, between them, already rebuilt the letters, the typefaces, the shaping, and the specifications, on weekends, for love. The remaining gap is one well-understood algorithm in a handful of layout engines. Somebody will close it, probably unpaid, possibly reading this (or writing it? who knows).Credits;Thanks to jay, for doing a helpful QA work by pointing out issues both in the representation form of the timeline history, a grammatical mistake, an ambiguous paragraph and a bug in one of the demos.Thanks to ralfj for pointing out that the claims regarding Latin script typesetting is not very precise:However, what the article says about Latin script typesetting is not entirely correct. At least historically, justified Latin script text does not just involve adding spaces. In the era of the Gutenberg press, typesetters had multiple variants of letters like "m" that had a different width, which allowed for justifying text without introducing awkwardly large amounts of space. The Luther bible is a beautiful example of what that can look like. Sadly, that technique got lost pretty much entirely in the digital era. The "microtype" LaTeX package has an approximation of that with its character protrusion; I am not sure if any other system even tries to replicate it. It's not nearly as visible as what happens in Arab scripts, so losing it is not nearly as impactful, but it still feels like a small version of the same issue.Thanks to Dr. Nawal Hadeed for reviewing the post, and contributing the section on Kashida.Thanks to fanf for pointing out ambiguity and an error I wrote on the justification algorithms details.Further readingA short reading list for whoever wants to follow any of these threads further. Where a single best source exists for a claim I made above, it is on this list. I have not linked everything because half the canonical sources are paywalled academic journals or library-only manuscripts; the search terms are reliable.The software:Amiri, the font, by Khaled Hosny. The home page is amirifont.org, the source is on GitHub, the licence is OFL.HarfBuzz. harfbuzz.github.io for docs; the source is in the Linux Foundation's repos. The hb-shape command-line tool is the cheapest way to see what your text is doing under the hood.Tasmeem and DecoType, by Thomas Milo and Mirjam Somers. decotype.com. Their published papers, especially Milo's "Towards Arabic Historical Script Grammar", are the deepest theoretical writing on the script anywhere.The specifications:UAX #9, the Unicode Bidirectional Algorithm. unicode.org/reports/tr9.W3C Arabic Layout Requirements (alreq). w3.org/TR/alreq. The most thorough English-language description of what Arabic typography needs from software. Excellent, almost unimplemented.The OpenType specification, on Microsoft Typography's docs. The script-tags section names every feature this article touches.The history:Kathryn A. Schwartz, "Did Ottoman Sultans Ban Print?", Book History vol. 20 (2017). The paper that demolished the printing-ban story most textbooks still repeat.Huda Smitshuijzen AbiFarès, Arabic Typography: A Comprehensive Sourcebook (2001). The closest thing to a textbook the field has. Out of print; libraries have it.Abdelkebir Khatibi and Mohammed Sijelmassi, The Splendour of Islamic Calligraphy (1976). The book that taught a generation of Western designers what they were looking at.Yasin Hamid Safadi, Islamic Calligraphy (Thames & Hudson, 1978). Brief, classical, well illustrated.Geoffrey Roper, "Arabic Printing and Publishing", the standard short history of pre-twentieth-century Arabic print.For the Mrowa years at al-Hayat: any decent history of twentieth-century Arab journalism; Ami Ayalon, The Press in the Arab Middle East (1995), is the standard.The manuscripts:The Ibn al-Bawwāb Qurʾān (1001 CE) is in the Chester Beatty Library, Dublin. Most folios are also viewable online through the library's catalogue.The 1924 Cairo Qurʾān is reproduced in many editions; high-resolution scans of the original Amiria Press impression are intermittently available through the Internet Archive and the Bibliothèque Orientale at the Université Saint-Joseph in Beirut.The Fano Kitāb Ṣalāt al-Sawāʿī (1514) survives in a small number of copies; the British Library and the Vatican Library both have one.The Paganini Qurʾān (1537/8) survives in the single copy rediscovered in 1987 at the Franciscan convent of San Michele in Isola, Venice.#Programming #hisotry #20:10 Khaled Hosny
Introduction to the experience of rendering Arabic typography&its technical debt
Article URL: Comments URL: Points: 7 # Comments: 0
Article URL: Comments URL: Points: 7 # Comments: 0
- The justification lives inside the words. (Public domain, via Wikimedia Commons.)And this was not improvisation but a system, with a paper trail.
- That is how a phone number stored as "010-1234-5678" arrives on screen as "5678-1234-010", per spec, in every browser, identically wrong. add an Arabic word before the number+20-100-1234-5678A phone number, stored exactly as written.
- The inconvenient detail, which the historian Kathryn Schwartz laid out in 2017, is that no text of either edict survives, and the story traces back to the reports of European travellers.
- The result is "Simplified Arabic": initial fused into medial, final into isolated, ligatures dropped.
- Nobody files a ticket when this happens, because the text stays legible the way "th qck brwn fx" stays legible.
What people are saying
Hot takes
Loading takes…
Comments
Discussion · 0
Sign in to comment, like, and save articles.
Sign inLoading comments…



