#2593: How to Type in Paleo-Hebrew: Unicode, Keyboards & Ancient Scripts

What it takes to build a custom keyboard for an ancient biblical script, from Unicode politics to font design.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2752
Published: May 2
Duration: 36:13
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: unicode keyboard-layouts bidirectional-text

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Can You Type in Paleo-Hebrew? The Full Stack of Ancient Script Computing**

Custom keyboards and ancient scripts don't usually overlap, but for anyone curious about building a keyboard that types in Paleo-Hebrew — the script used in the biblical era before square Aramaic letters took over — the real infrastructure question runs much deeper. It touches Unicode encoding, UTF-8, custom firmware, font design, and the politics of how the world standardizes writing systems.

Paleo-Hebrew Is Already in Unicode

The first surprise: Paleo-Hebrew has been in Unicode since 2009, version 5.1. It lives in the Phoenician block, code points U+10900 through U+1091F — 22 letters, same as the Phoenician alphabet, because historically they're the same script. But this unification was controversial. Hebrew scholars argued Paleo-Hebrew deserved its own block; Phoenician scholars argued the letterforms were identical. The Unicode Consortium sided with unification, meaning a Phoenician aleph and a Paleo-Hebrew aleph share the same code point. The font you use determines which visual style you get.

This parallel to Han unification — where Chinese, Japanese, Korean, and Vietnamese characters were merged into one block — reveals a recurring tension in Unicode's design philosophy: if the abstract character is the same, unify it and let fonts handle visual differences. Database engineers love this; humanities scholars often don't.

The Keyboard Firmware Problem

With the characters already in Unicode, building a Paleo-Hebrew keyboard becomes a mapping problem. Using QMK firmware (standard for custom wired keyboards), you can program any key to send any Unicode code point via QMK's Unicode map feature. But keyboards don't natively speak Unicode — they send scan codes that the operating system interprets. To send an arbitrary Unicode character, you need a Unicode input method: on Windows, a registry hack enabling hex input (Alt+plus+hex digits); on macOS, the Unicode hex input layout; on Linux, Ctrl+Shift+U followed by hex digits. QMK automates this sequence, so pressing one key triggers the entire Alt+plus+hex-digits dance without the user thinking about it. The result is slower than normal typing — fine for ancient scripts, but a reminder that modern computing's scaffolding was never designed for this use case.

Unicode vs. UTF-8: What's the Difference?

Unicode is the map: a giant table assigning a unique number (code point) to every character in every writing system. As of version 16, there are over 154,000 characters mapped, including emoji, mathematical symbols, and musical notation. UTF-8 is the encoding scheme that turns those code points into bytes for storage and transmission. Its brilliance is backward compatibility with ASCII — every ASCII character is one byte, identical to ASCII encoding. Characters beyond ASCII use up to four bytes, with a self-synchronizing bit pattern that lets you find character boundaries even mid-stream. Over 98% of web pages use UTF-8.

The Font and Right-to-Left Challenges

Having a code point isn't enough — you need a font that renders it. Google's Noto fonts include Noto Sans Phoenician, and ALPHABETUM Unicode covers ancient scripts. But these render characters in a Phoenician style, which may not match a specific Paleo-Hebrew look (say, from the Siloam inscription or the Lachish letters). Creating a custom font for a particular archaeological period is a specialized project requiring font editing tools like FontForge or Glyphs.

Then there's right-to-left text handling. The keyboard doesn't handle this — it's the job of the Unicode Bidirectional Algorithm (Bidi), defined in Unicode Standard Annex #9. Every character has a directional property; the Phoenician block is designated right-to-left, so typed characters display with the first character on the right. Mixing Paleo-Hebrew with English annotations creates complex Bidi scenarios that the algorithm must resolve, often with surprising results.

The Takeaway

Building a Paleo-Hebrew keyboard is technically possible today, but it requires navigating four layers of infrastructure: Unicode code points (already assigned), keyboard firmware (QMK with Unicode input methods), font design (custom glyphs for the exact historical style desired), and text rendering (the Bidi algorithm for right-to-left display). Each layer has its own quirks and politics, but together they show how modern computing — built for Latin characters and left-to-right text — can be adapted to serve any writing system humans have ever used.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2593: How to Type in Paleo-Hebrew: Unicode, Keyboards & Ancient Scripts

Daniel sent us this one — he's been thinking about something that connects two threads we've kicked around before. Custom keyboards and Paleo-Hebrew. The ancient script Hebrew used back in the biblical era, before the square Aramaic letters took over. He's asking: what if you wanted to build a keyboard that types in Paleo-Hebrew? And he wants us to dig into the real infrastructure question underneath that. Unicode encoding, UTF-8, the whole bedrock of how computers actually handle characters. What happens when the glyphs you want aren't in the standard? How do you make custom ones? And if they are in Unicode, how do you map keys to these obscure code points while preserving right-to-left formatting? It's a niche within a niche, but the encoding question is genuinely fundamental to how everything we type works.

Oh, this is fantastic. And before we dive in — quick note, today's episode is being scripted by DeepSeek V four Pro. So credit where it's due for whatever comes out of my mouth.

Alright, so where do we even start? Paleo-Hebrew, custom keyboards, Unicode politics — there's a lot here.

Let me start with the thing that actually surprised me when I looked into this. Paleo-Hebrew is in Unicode. It's been there since two thousand nine, Unicode version five point one. It lives in the Phoenician block, code points U plus one zero nine zero zero through U plus one zero nine one F. Twenty-two letters, same as the Phoenician alphabet, because historically they're the same script. The Phoenicians spread it around the Mediterranean, the Israelites used it, the Samaritans still use a descendant of it today.

Unicode just lumped them together? Phoenician and Paleo-Hebrew, same block?

That's where the politics starts. There was a huge debate when this was being standardized. Scholars who study Hebrew said, look, this is a distinct script with its own history, it should have its own block. Scholars who study Phoenician said, no, the letterforms are identical, the encoding should be unified. The Unicode Consortium went with unification. So technically, if you type a Phoenician aleph, it's the same code point as a Paleo-Hebrew aleph. The font you use determines which visual style you get.

Which feels like one of those decisions that makes sense if you're a database engineer and drives humanities people up the wall. Like saying Chinese and Japanese kanji are the same thing because they look similar.

That's actually a great parallel, because Han unification was itself a massive controversy in the early days of Unicode. The Chinese, Japanese, Korean, and Vietnamese characters all got unified into one block, and the argument was exactly the same — are these the same character with regional variants, or different characters that happen to share a shape? In both cases, Unicode said: if the abstract character is the same, unify it and let fonts handle the visual differences.

If I'm Daniel, and I want to build my Paleo-Hebrew keyboard, step one is already done for me. The characters exist. Twenty-two letters, sitting in the Phoenician block. I just need to map keys to those code points.

And this is where the custom keyboard firmware we talked about in previous episodes becomes relevant. If you're using QMK firmware, the standard for wired custom keyboards, you can program any key to send any Unicode code point. QMK has a feature called Unicode map, where you define a table of code points and then assign them to keycodes. So you could have a key that sends U plus one zero nine zero zero, the Phoenician letter aleph.

Sending a Unicode code point from a keyboard isn't as straightforward as sending an ASCII character. The operating system has to understand what you're doing.

Keyboards don't natively speak Unicode. They send scan codes, which the operating system interprets. To send an arbitrary Unicode character, you need what's called a Unicode input method. On Windows, QMK typically uses a registry hack that enables hex input — you hold Alt, type the plus key, then the hex code point. On macOS, you use the Unicode hex input keyboard layout. On Linux, Control Shift U followed by the hex digits. QMK can be programmed to automate that sequence, so pressing one key triggers the whole Alt-plus-hex-digits dance without you thinking about it.

Every time I type a single Paleo-Hebrew letter, behind the scenes my keyboard is doing this little Alt-plus ritual, spitting out four or five hex digits, and the OS goes, ah yes, Phoenician aleph.

It's slower than normal typing, which introduces latency. For a niche project like a Paleo-Hebrew keyboard, that's fine. Nobody's typing a hundred words per minute in ancient scripts. But it's one of those things where you realize how much of modern computing is built on scaffolding that was never designed for what we're doing with it.

Let's back up, because I think Daniel's real question is bigger than just the keyboard mapping. He's asking about the whole encoding stack. UTF-8, Unicode, what all of this actually is. A lot of people have heard these terms but don't really know what's happening under the hood. So walk me through it. What is Unicode, and what is UTF-8, and why are they different things?

This is one of those things I love explaining, because the design is elegant once you see the problem it's solving. Unicode is the map. It's a giant table that assigns a unique number — a code point — to every character in every writing system humans use. Latin A is U plus zero zero four one. The Chinese character for "dragon" is U plus nine F eight D. The Phoenician aleph is U plus one zero nine zero zero. As of Unicode version sixteen, there are over one hundred fifty-four thousand characters mapped. The goal is to represent every character ever used in human writing, including emoji, mathematical symbols, musical notation, everything.

Unicode is just a big lookup table. Number to meaning. But that's not how the numbers get stored on disk or sent over a network.

Right, that's where UTF-8 comes in. UTF-8 is an encoding scheme — the most common way — of turning those Unicode code points into actual bytes that computers can store and transmit. And the brilliant thing about UTF-8 is that it's backward-compatible with ASCII. Every ASCII character is encoded in UTF-8 as exactly one byte, identical to how ASCII encoded it. So any system built for ASCII can handle UTF-8 text as long as it sticks to basic Latin characters. It won't break.

When you need characters beyond that, it uses more bytes.

Up to four bytes per character. The way it works is clever. If a byte starts with a zero bit, it's a single-byte ASCII character. If it starts with one one zero, it's the start of a two-byte sequence. One one one zero means three bytes. One one one one zero means four bytes. And bytes that start with one zero are continuation bytes. The encoding is self-synchronizing — if you jump into the middle of a UTF-8 stream, you can figure out where character boundaries are by looking at the bit patterns.

UTF-8 isn't the only game in town. There's UTF-16, UTF-32.

UTF-32 is the straightforward one — every character gets exactly four bytes. Simple but wasteful for text that's mostly Latin. UTF-16 uses two bytes for most common characters, and a special mechanism called surrogate pairs for characters beyond the basic multilingual plane, like those Phoenician code points or emoji. Windows uses UTF-16 internally, which is why you sometimes see those weird character rendering bugs when something goes wrong with surrogate pairs. Mac and Linux use UTF-8 natively.

The web runs on UTF-8.

Over ninety-eight percent of all web pages are UTF-8 encoded. When Daniel types his Paleo-Hebrew characters into a text file and saves it as UTF-8, that file can be opened on any modern system anywhere in the world and it'll render correctly, assuming the system has a font that covers the Phoenician block.

Which brings us to the font problem. Having a code point is one thing. Having something to actually display is another.

This is where it gets interesting for Daniel's project. If he's just using the existing Phoenician Unicode block, there are fonts that cover it. Not many, but they exist. Google's Noto fonts, which aim for complete Unicode coverage, include Noto Sans Phoenician. There's also ALPHABETUM Unicode, designed for ancient scripts. So if you install one of those, you can type and see the characters on screen. But here's the thing — those fonts will render the characters in a Phoenician style, which may or may not match what Daniel wants for Paleo-Hebrew specifically.

Because as you said, the letterforms are technically unified. The font designer chooses how aleph looks.

Unicode encodes abstract characters, not glyphs. The glyph — the actual visual shape — is the font's job. For Latin letters, we're used to this — Times New Roman and Helvetica both render the letter A, but they look different. For ancient scripts, the differences can be more significant, because there's no single standardized typographic tradition.

If Daniel wants his Paleo-Hebrew to look a specific way — maybe he wants the letterforms from a particular archaeological period, like the Siloam inscription from the eighth century B.or the Lachish letters from the sixth century B.— he might need a custom font.

That's a whole project in itself. Font design for ancient scripts is a specialized field. You need to understand the historical letterforms, the stroke order, the variations across time periods and regions. Then you need to actually design the glyphs in a font editor like FontForge or Glyphs and map those glyphs to the correct Unicode code points. It's doable — people do it — but it's not a weekend project.

Let's say you do all of that. You've got your custom keyboard with QMK firmware sending Unicode code points. You've got a custom font that renders those code points in the exact Paleo-Hebrew style you want. You type something. Does it read right-to-left?

Ah, there's the question. Right-to-left text handling is not something the keyboard does. It's not even something the font does, really. It's handled by the text rendering engine in the operating system or application, using something called the Unicode Bidirectional Algorithm, or Bidi algorithm for short.

Which is a whole can of worms, I'm guessing.

The Bidi algorithm is this wonderfully intricate piece of engineering that most people never think about. It's defined in the Unicode Standard Annex Number Nine, and it determines how characters with different directional properties are ordered for display. Every Unicode character has a directional property. Hebrew and Arabic characters are marked as right-to-left. Latin characters are left-to-right. Punctuation and numbers are typically neutral and take their direction from surrounding characters.

If I type a Paleo-Hebrew aleph, the system knows it's a right-to-left character because it's in the Phoenician block, which has the right-to-left property.

But here's where it gets subtle. The Phoenician block is officially designated as right-to-left. However, there was a debate about this — some scholars argued that Phoenician was sometimes written left-to-right, or even boustrophedon, alternating direction line by line. But Unicode settled on right-to-left as the default. So if Daniel types a sequence of Paleo-Hebrew characters, they'll display right-to-left, with the first character he typed appearing on the right.

What if he's mixing Paleo-Hebrew with English annotations? That's where Bidi gets complicated.

That's where it gets very complicated. If you type a Paleo-Hebrew word, then a space, then an English word, the Bidi algorithm has to figure out the visual order. The Paleo-Hebrew part flows right-to-left, the English part flows left-to-right, and the algorithm determines where each segment goes on the line. Most of the time it works. When it doesn't, you get things like punctuation appearing on the wrong side of a word, or numbers getting reversed. There are special Unicode control characters you can insert — like the right-to-left mark and left-to-right mark — to force the direction when the algorithm gets it wrong.

Daniel's keyboard could include keys for those control characters. A "force RTL" key and a "force LTR" key.

You could even program a macro that inserts the RTL mark before you start typing in Paleo-Hebrew, and an LTR mark when you switch back to English. QMK can do that.

Let's go back to the scenario Daniel raised where the characters aren't in Unicode at all. He said he couldn't remember if Paleo-Hebrew was supported — we've established it is — but what if you're working with something unencoded? A script that never made it into the standard. How do you even approach that?

This is a real problem that scholars and communities face. There are scripts — historical writing systems, or minority languages that never got digitized — that simply aren't in Unicode. And if your script isn't in Unicode, you're in a difficult position. You can't just type it. You can't put it on the web in a way that others can read without special software. You're effectively locked out of the global information infrastructure.

Which is a pretty serious form of digital exclusion.

And the process for getting a script into Unicode is not trivial. You have to submit a formal proposal to the Unicode Consortium, with evidence of the script's historical or current use, a complete character repertoire, sample glyphs, and a justification for why it needs to be encoded separately from existing scripts. The proposal goes through committee review, and if accepted, the script gets a block of code points in a future version of the standard. The whole process can take years.

In the meantime, what do people do? Private use area?

That's exactly the workaround. Unicode reserves a range of code points for private use — U plus E zero zero zero through U plus F eight F F, plus a couple of supplementary ranges. These code points are guaranteed to never be assigned to standard characters. You can use them for whatever you want, but the catch is that they only mean something within your own system. If I define U plus E zero zero one as a custom glyph in my font, and I send you a document using that code point, you won't see the glyph unless you also have my font installed. And if someone else used U plus E zero zero one for a completely different glyph in their system, our two uses would conflict.

It works for personal projects, or for a closed community that shares the same font. But it doesn't scale.

There's an even thornier issue. If you use the private use area for a script that later gets officially encoded in Unicode, you have a migration problem. All your existing documents use the wrong code points. You need to convert them. This actually happened with several scripts that were eventually standardized — early adopters had to re-encode their data.

Which is why the Unicode Consortium is careful and slow. Getting it wrong means a lot of pain later.

They've learned from mistakes. The original Unicode standard from the early nineties made some decisions that later had to be fixed with complicated workarounds. The Han unification thing we mentioned earlier — that's still controversial decades later. Every time they add new emoji, there's a debate about whether it should be a new character or a variation of an existing one.

Let's talk about emoji for a second, because that's actually a great window into how Unicode works in practice. Most people interact with Unicode primarily through emoji, even if they don't realize it.

Emoji are fascinating from an encoding perspective. Each emoji is a Unicode code point, or sometimes a sequence of code points combined with a special zero-width joiner character. The "family" emoji, for example, is built by combining individual person emoji with the zero-width joiner. So a family of four isn't one code point — it's seven: man, zero-width joiner, woman, zero-width joiner, girl, zero-width joiner, boy. The rendering engine sees that sequence and displays a single family graphic if the font supports it.

Which explains why emoji look different on different platforms. The code points are the same, but the fonts are different.

Apple's emoji font renders the "grinning face" one way, Google's renders it another way, Samsung's renders it a third way. They're all interpreting the same abstract character. And this causes real communication problems — there are emoji that look positive on one platform and sarcastic on another. The "folded hands" emoji is interpreted as praying on some platforms and as a high-five on others.

When Daniel's building his Paleo-Hebrew keyboard, he's working with the exact same infrastructure that handles emoji. The same encoding, the same rendering pipeline, the same Bidi algorithm. It's just a different block of code points.

That's what I find beautiful about Unicode. It's this massive, sprawling, sometimes messy project to encode all of human written expression into a single coherent system. Whether you're typing a smiley face, a mathematical integral, a Phoenician aleph, or the letter A, it's all the same system underneath. The same UTF-8 bytes flowing through the same pipes.

Alright, let's get practical. Daniel wants to build this thing. What does the actual implementation look like? Walk me through the steps.

Step one is the keyboard hardware. He needs something running QMK or ZMK firmware if he wants wireless. We talked about this in previous episodes, so I won't rehash the basics. But the key point is that the firmware needs to support Unicode input.

QMK does that natively.

QMK does it through several methods. The simplest is the UCIS system — Unicode Character Input System. You define a table mapping mnemonic strings to Unicode code points. So you could type "aleph" and it would output U plus one zero nine zero zero. But for a dedicated keyboard, you'd want direct key mapping. You'd define a custom keymap where each physical key corresponds to a Unicode code point. The keymap file would look something like X underscore U one zero nine zero zero for the aleph key, X underscore U one zero nine zero one for beth, and so on.

Those X underscore U macros trigger the Alt-plus-hex input method we talked about.

But there's a gotcha. The Alt-plus-hex method works differently on different operating systems. On Windows, you need to enable a registry setting called EnableHexNumpad. On macOS, you need to switch to the Unicode Hex Input keyboard layout. On Linux, Control Shift U works out of the box. So if Daniel uses multiple operating systems, he might need different firmware configurations or a more portable approach.

What's the more portable approach?

One option is to have the keyboard present itself as a composite device — part keyboard, part something else that can send Unicode directly. Some custom keyboard builders use the Raw HID protocol to send arbitrary data to a companion application running on the computer, which then injects the Unicode characters. It's more complex to set up, but it avoids the Alt-plus-hex dance entirely and works consistently across platforms.

That's getting into serious hobbyist territory.

It is, but custom keyboard people are serious hobbyists. The community has solved these problems. There are QMK users who type in Japanese, Chinese, Arabic, all kinds of scripts, using these same techniques. A Paleo-Hebrew keyboard is unusual in its choice of script, but not in its technical requirements.

Step two is the keycaps. What do you put on the physical keys?

That's a fun challenge. You can't exactly buy a Paleo-Hebrew keycap set off the shelf. You could use blank keycaps and label them yourself — there are services that do custom dye-sublimation printing on keycaps. You could use relegendable keycaps, which have a clear plastic cover that lets you insert a paper label. Or you could go fully custom and commission a small batch from a keycap manufacturer.

The legends would need to show the Paleo-Hebrew letterforms, ideally in a style that matches whatever font you're using on screen.

Which is a nice design consistency touch. But practically, you'd probably just want clear, recognizable letterforms that you can read at a glance. The Paleo-Hebrew alphabet has twenty-two letters, and they're all consonants — it's an abjad, like modern Hebrew and Arabic. So you could fit the whole alphabet on a small macro pad, or you could use a standard keyboard and just replace the legends on the keys you're remapping.

What about vowels? Paleo-Hebrew didn't originally have vowel markings, right?

The vowel pointing system — the dots and dashes around the letters, called niqqud — was developed much later, by the Masoretes in the early Middle Ages. Paleo-Hebrew inscriptions are purely consonantal. So Daniel's keyboard would have twenty-two letter keys, and maybe some punctuation. That's it. A very compact layout.

Which actually makes the keyboard design cleaner. Fewer keys to map, fewer keys to label.

If he wanted to get fancy, he could add a layer for modern Hebrew niqqud, or for the Samaritan variant of the script, which has its own Unicode block — U plus zero eight zero zero through U plus zero eight three F. The Samaritan script is a direct descendant of Paleo-Hebrew and it's still used liturgically by the Samaritan community. So there's a living tradition connected to this ancient writing system.

That's actually remarkable. The Samaritans never adopted the square Aramaic script that became standard in Judaism. They kept the old letters.

And their version of the script evolved over time, so it looks different from the ancient inscriptions, but the lineage is direct. There are only about eight hundred Samaritans left today, mostly split between Mount Gerizim in the West Bank and Holon in Israel. Their Torah is written in the Samaritan script. It's one of those threads of history that's still alive if you know where to look.

Alright, let's go deeper into the file format question Daniel raised. If I create a document in Paleo-Hebrew using a custom font mapped to the Phoenician Unicode block, and I save it as a plain text file in UTF-8, what happens when I send it to someone?

It depends on what they have installed. If they have a font that covers the Phoenician block — like Noto Sans Phoenician — they'll see the characters. They might not see them in the exact style Daniel intended, but they'll see recognizable Paleo-Hebrew letterforms. If they don't have any Phoenician font, they'll see those little boxes — the infamous "tofu" that indicates a missing glyph. The text is still there, the code points are intact, but the rendering fails because there's no font to provide the visual shapes.

The file is readable in the data sense, but not necessarily in the visual sense.

And this is the fundamental distinction between plain text and rich text. A plain text file stores characters, not fonts. A rich text format — like a Word document, a PDF, or an HTML page — can embed fonts or reference web fonts, so the visual presentation is preserved across systems. If Daniel wants his Paleo-Hebrew documents to be reliably viewable by others, he should use a format that supports font embedding, or publish them as PDFs with the font included.

Or put them on a website with the font served via CSS.

The web font mechanism is actually the most practical solution for sharing niche scripts. You host the font file on your server, and anyone who visits your page sees the correct glyphs, even if they don't have the font installed locally. This is how Wikipedia handles ancient scripts — they use web fonts to ensure everyone can read the content regardless of what's on their device.

Let's talk about the right-to-left thing in practice. If Daniel's typing away in his custom keyboard, producing a stream of Phoenician Unicode characters, and he saves them in a text editor — does the text editor need to support Bidi rendering?

Yes, and not all text editors do. Basic editors like Notepad on Windows have supported Bidi for years now, but it wasn't always the case. More advanced editors like Visual Studio Code or Sublime Text handle it fine. But if you're using some minimalist terminal-based editor, you might get left-to-right rendering even for right-to-left characters, which makes editing extremely confusing.

Because you're editing in logical order — the order the characters were typed — but they're displayed in visual order, which is reversed.

That mismatch between logical and visual order is the source of endless confusion. When you're editing right-to-left text, you need the cursor to move in visual order, not logical order. You need text selection to work visually. You need line wrapping to happen at the correct visual boundaries. All of this requires the editor to implement the Bidi algorithm correctly, and many editors have bugs in their Bidi implementation, especially when mixing right-to-left and left-to-right text on the same line.

Daniel might find himself fighting his text editor as much as anything else.

That's before we get to the really fun edge cases. What happens when you have a Paleo-Hebrew word, then a number, then an English word? Numbers in Bidi are a special case — they're displayed left-to-right even within a right-to-left context. So the number three hundred would display with the three on the left and the zeros on the right, even though the surrounding text flows right-to-left. This is actually correct for Hebrew and Arabic, where numbers are read left-to-right even in a right-to-left sentence, but it can be visually jarring if you're not used to it.

I never thought about that, but you're right. In Hebrew, you read the sentence right-to-left, hit a number, read it left-to-right, then continue right-to-left.

The Bidi algorithm handles all of that automatically, based on the directional properties assigned to each character. It's one of those pieces of infrastructure that's invisible when it works and maddening when it doesn't.

Let's zoom out for a second. Daniel's project is niche — a keyboard for an ancient script that only scholars and enthusiasts care about. But the infrastructure questions it raises are universal. Every writing system on earth goes through this same encoding pipeline. The decisions made by the Unicode Consortium affect billions of people.

They affect them in ways most people never notice. Take the case of the Indian rupee sign. When India introduced a new currency symbol in two thousand ten, it had to go through the Unicode approval process to get its own code point. Until that happened, people had to use workarounds — typing "Rs" or using the generic currency sign. The proposal was approved, and now the rupee sign is at U plus two zero B nine. That process, from national policy decision to universal digital availability, is a story about how encoding infrastructure shapes what we can express.

There are scripts that are still waiting. There are languages in Africa and Asia that don't have full Unicode support, or whose support is buggy or incomplete. The digital divide isn't just about internet access — it's about whether your writing system is a first-class citizen in the global information infrastructure.

There's a deeper philosophical question here. Unicode is, in a sense, the central registry for human written expression. A relatively small group of people — the Unicode Technical Committee — decides what gets encoded and what doesn't. They decide whether Phoenician and Paleo-Hebrew are the same script. They decide which emoji get added. They decide how the Bidi algorithm works. It's an enormous amount of power, and it's mostly exercised through technical standards that the general public never thinks about.

Is there an alternative? Some kind of decentralized encoding system?

Not really, and that's the thing. The value of Unicode is precisely that it's universal. If everyone had their own encoding system, we'd be back to the chaos of the eighties and early nineties, where documents in one encoding were unreadable on systems using a different encoding. There were dozens of competing encodings for non-Latin scripts — Shift-JIS for Japanese, Big5 for Traditional Chinese, KOI8-R for Russian Cyrillic. Sharing text across systems was a nightmare. Unicode solved that by being the one standard everyone agreed on.

It's a natural monopoly. The network effects are so strong that fragmentation is worse than imperfect centralization.

And to their credit, the Unicode Consortium takes that responsibility seriously. The encoding process is meticulous. Proposals require extensive documentation. There's a formal review period. The goal is to get it right the first time, because fixing it later is exponentially harder.

Which brings us back to Daniel's project. He's not just building a quirky keyboard. He's participating in this entire infrastructure, whether he realizes it or not. Every time he maps a key to a Phoenician code point, he's relying on decisions made by a committee years ago.

If he creates a custom font, he's adding his own layer to that infrastructure. He's making a choice about how these ancient letters should look on screen, in the twenty-first century. That's not a small thing. Typography shapes how we perceive text. The visual form of letters affects readability, aesthetic response, even credibility.

If Daniel goes through with this, what's the hardest part? What's the thing that's most likely to trip him up?

Honestly, I think the hardest part isn't the technical stuff. The QMK configuration, the Unicode mapping, the keycap design — those are all solvable problems with existing tools and community knowledge. The hard part is the font design. Creating a good typeface for an ancient script requires deep knowledge of paleography. You need to understand the historical letterforms, the variations across time periods, the stroke order, the proportions. And then you need the technical skill to translate that into bezier curves in a font editor. It's a rare combination of skills.

Maybe the practical advice is: start with an existing Phoenician font, get the keyboard working, and then decide if the custom font is worth the effort.

That's exactly what I'd recommend. Use Noto Sans Phoenician or another existing font as a starting point. Get the whole pipeline working — keyboard, firmware, Unicode input, text editor, file format. Once that's all solid, then think about whether you want to invest the time in designing custom glyphs.

The keyboard itself? What would a Paleo-Hebrew layout even look like?

The traditional order of the Hebrew alphabet is aleph, beth, gimel, daleth, he, waw, zayin, heth, teth, yodh, kaph, lamedh, mem, nun, samekh, ayin, pe, tsade, qoph, resh, shin, taw. If you laid that out on a standard QWERTY keyboard, you could map aleph to A, beth to B, gimel to G, and so on, using phonetic correspondence. Or you could lay them out in a grid that matches the traditional order, which might be more intuitive for someone who knows the Hebrew alphabet.

The phonetic mapping seems more practical for someone coming from an English typing background.

It does, but there's a problem. The Phoenician block in Unicode doesn't have a one-to-one correspondence with modern Hebrew. Some letters that are distinct in modern Hebrew — like sin and shin, which are the same letter with different dot placement — are a single character in Paleo-Hebrew. And some Paleo-Hebrew letters don't have a simple phonetic equivalent in English. Ayin is a guttural sound that doesn't exist in English. Heth is a guttural H that's different from the regular H. So phonetic mapping only gets you so far.

Maybe a dedicated macro pad with the letters arranged in traditional order, clearly labeled with the Paleo-Hebrew glyphs, would be the cleanest solution.

That's what I'd do. A three-by-eight grid of keys, or a four-by-six, with the twenty-two letters plus maybe a few extras for punctuation and Bidi control characters. Label each key with the Paleo-Hebrew letterform. Program QMK to send the corresponding Unicode code point. Install a font that covers the Phoenician block. And you're typing in Paleo-Hebrew.

The whole thing would cost what, maybe a hundred dollars in parts? Plus the time to design and assemble it.

Less, if you use an off-the-shelf macro pad and just reconfigure the firmware and keycaps. You can get a programmable macro pad for thirty or forty dollars. Blank keycaps are cheap. The font is free if you use Noto. The firmware is open source. The main investment is time and curiosity.

Which Daniel has in abundance. Alright, I think we've covered the ground. Paleo-Hebrew is in Unicode, in the Phoenician block. Custom keyboards can send those code points via QMK. Fonts exist but custom ones are possible. Bidi handles the right-to-left formatting, mostly. UTF-8 is the encoding that makes the files portable. And the whole thing is a window into this massive invisible infrastructure that handles every writing system on earth.

The infrastructure is worth understanding even if you never build a Paleo-Hebrew keyboard. Every time you type an emoji, every time you read a web page in a different language, every time you see right-to-left text rendered correctly, you're seeing Unicode at work. It's one of the great achievements of digital infrastructure, and it's almost completely invisible.

There's something nice about that, actually. The most successful technologies are the ones you don't have to think about. They just work, and you take them for granted, until someone asks you how to build a keyboard for a three-thousand-year-old script and you realize how much is happening under the hood.

Now: Hilbert's daily fun fact.

Hilbert: The average cumulus cloud weighs approximately one point one million pounds — roughly the same as one hundred elephants — yet it floats because the weight is spread across millions of tiny water droplets over a vast volume of air.

A hundred elephants, just drifting overhead.

I'm going to think about that every time I look up now.

Here's the forward-looking thought. Daniel's project is niche, but these same tools and techniques apply to any writing system that's underrepresented digitally. If you speak a minority language, or you're trying to preserve a historical script, or you want to type in a script that never got commercial keyboard support — the path exists. It takes some work, but the infrastructure is there, and it's more accessible than it's ever been. That's worth remembering.

A big thank you to our producer, Hilbert Flumingtop, for keeping this whole operation running.

This has been My Weird Prompts. If you want to dig deeper into any of this, find us at myweirdprompts.We'll be back with whatever Daniel sends us next.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2593: How to Type in Paleo-Hebrew: Unicode, Keyboards & Ancient Scripts

Downloads

You Might Also Like

#2593: How to Type in Paleo-Hebrew: Unicode, Keyboards & Ancient Scripts