Engineering Explainer

When machines stop writing for us

If the reader is another AI, why write in tidy English at all? A new method compresses text into dense, alien symbols that humans can't read but models still understand.

When you type a message to an AI chatbot, you write in plain English, and it answers in plain English. That feels natural, because you're a human and humans read language. But more and more often, the thing reading the AI's output isn't a person at all — it's another model, or the same model later, picking up notes it left for itself. So this paper asks a genuinely strange question: if the reader is a machine, does the message still have to be written in tidy, human-readable language?

A habit we inherited

Everything you give a language model — your question, the documents, the conversation so far — gets fed in as context, and the context has a size limit: the model's short-term working memory. Models don't read letters or whole words; they chop text into tokens, roughly word-pieces. Cost, speed, and the context limit are all measured in tokens. So when we talk about compressing text, what we really mean is carrying the same meaning in fewer tokens.

That context window is expensive in two senses. Literally — most AI services charge by the token, so longer inputs cost more and take longer. And practically — models get less reliable when you stuff the window full, losing details buried in the middle of a long input. Agents that work through long tasks and pass notes between steps are constantly bumping against this limit. So there's a real prize for anyone who can pack the same meaning into fewer tokens. The field already has a name: prompt compression.

Sacrificing readability for density isn't a bizarre new idea. Humans have done it for centuries whenever the channel was expensive — telegrams charged by the word, so people dropped every article and pronoun; mathematics replaced sprawling sentences with compact notation. Fluent, flowing prose is only one possible way to write something down, and it happens to be the one optimised for a human reader. The authors simply ask: when the reader is a model, what surface form would it prefer?

BabelTele

Most existing compression methods play by a hidden rule: whatever you do, keep the result looking like normal language. You might delete redundant words or rewrite a passage as a summary, but the output stays recognisable text. This paper challenges exactly that rule. Human language is full of redundancy — grammar, connecting words, repetition, narrative flow — all of which helps a human follow along but, from a pure information standpoint, is not very dense.

So the authors propose what they call BabelTele — a nod to the Tower of Babel and to the telegraph. The idea is to let the model encode meaning into a compact, non-standard form that deliberately gives up human readability: text mixing abbreviations, mathematical and logical symbols, emoji, punctuation, arrows, and fragments of different languages, all jammed together into something that looks, frankly, like nonsense to you and me. The crucial bet is that an advanced model can still decode it.

Imagine you start with a few paragraphs describing Brussels — its grand historic buildings, its high prices except for cheap wine and flowers, the country split between French in the south and Dutch in the north. The BabelTele version might collapse into a stream of symbols: a greater-than sign, a little church emoji, an up-arrow, parentheses with "French" and "Dutch". To you it reads like a ransom note assembled from a keyboard and an emoji picker. To a capable model, the meaning is still in there.

How do you get a model to produce this? They don't retrain it or touch its internals. They just prompt it, using ordinary instructions, around three principles: borrow the most compact word or symbol from any language or script; replace long-winded grammar with terse symbols, emoji, and operators; and pack tightly while keeping enough structure that a capable model can reconstruct the meaning without a secret codebook. Because it's all plain prompting, it works on black-box systems — and they test a whole family of prompts, showing the effect is a general phenomenon, not a one-off trick.

The numbers

The headline: BabelTele preserves about 99.5% of the meaning — measured by how well a model can still answer questions about the content — while shrinking the text to under 30% of its original length. You throw away roughly seventy percent of the characters and keep almost all of the usable meaning.

They checked this isn't just a summary in disguise. Using perplexity — roughly, how surprised a language model is by a piece of text — ordinary English scores low, but BabelTele scores enormously higher, often by a factor of ten or more, with a readability score that rates it as extremely difficult and most of its words flagged as hard. This compressed text sits far outside the distribution of normal human language. It really is a different kind of object.

Readability for humans, typicality as normal language, and recoverability by a model turn out to be three separate properties.

And here is the heart of the paper. They gave the same BabelTele text to both human readers, recruited through paid questionnaires, and AI models, then tested them on multiple-choice questions. Human accuracy dropped clearly — to us it's gibberish — but a strong model stayed remarkably accurate. You can sacrifice the first two properties while keeping the third.

Across established benchmarks — long documents with comprehension questions, and meeting transcripts — they compared BabelTele against strong existing methods, including careful summaries and a well-known token-pruning system, over a range of compression levels. The pattern held: as you squeeze harder, ordinary methods lose accuracy faster, while BabelTele holds its meaning longer — a more favourable accuracy-retention frontier.

Most thought-provoking is cross-model transfer. They had one model compress the text and a completely different model, from a different company, try to read it — no shared training, no adaptation. Often, it worked. It wasn't perfect or universal: how well it transferred depended a lot on which model compressed and which read. But the fact that it transfers at all suggests these models share enough underlying structure that a compact code invented by one is partly legible to another.

The honest caveats

The authors are fair about limits. There's a space-time trade-off: when you compress extremely hard, the reading model sometimes has to do more internal reasoning to reconstruct the answer, using up tokens on the output side, so input savings get partly eaten by extra work. This wasn't unique to BabelTele — it showed up across methods whenever you remove too much. The lesson is a sweet spot of moderate compression rather than squeezing as hard as physically possible. And the deeper caveat: this is an empirical probe, an exploration of what's possible, not a finished protocol to deploy tomorrow.

Why it matters

Practically, if AI systems can talk to each other — or to their own memory — in a denser code, that directly attacks the cost and the bottleneck of long context. Agent memory: notes stored compactly let an agent remember more within the same window. Document question-answering: feed a dense encoding instead of a giant document. Multi-agent communication: the messages AIs pass back and forth are pure overhead, and compressing them makes the whole system cheaper and faster.

But there's a more unsettling implication. We've always treated natural language as the universal interface — knowledge in, instructions in, answers out, all in human language. This work suggests that's a choice, not a necessity. Right now, when an agent writes a note to itself, or one model hands work to another, we can read those messages, log them, and catch a system going off the rails. If that traffic migrates into a dense, model-native code, we lose that window almost by accident — not because anyone decided oversight didn't matter, but because the efficient thing to do happened to be the opaque thing to do. Readable language is how we talk to machines. It may not be how machines would choose to talk among themselves.