Three levers a serious engineering org pulls to get better

Correctness, infrastructure, culture. Jane Street, Slack and The Guardian each pulled a different lever this month — and together they show where lasting advantage hides when demos get cheap.

Most engineering stories are about a single feature or system. These three are about something broader: the different levers a serious organisation pulls to raise its game. Not one shiny launch, but three very different ways of getting better — proving your code correct, rebuilding your infrastructure strategy, and deliberately investing in culture. Jane Street, Slack and The Guardian each pulled a different one this month, and read together they say something about where lasting advantage lives in an era when the demos are getting cheap.

Jane Street bets on provable correctness

The first story is a genuine change of heart, and those are always interesting. Jane Street is a trading firm famous in software circles for betting heavily on the OCaml programming language and for caring deeply about code quality. One of their senior people opens a post with a striking confession: for twenty-five years, he says, he told everyone Jane Street was simply not interested in formal methods. Now they're building a team to work on them.

Formal methods means mathematically proving that a piece of code does exactly what it's specified to do — not testing it on a bunch of examples and hoping, but proving, with the rigour of a theorem, that it can't misbehave. The catch has always been cost. The post offers a vivid statistic: a famous project to formally verify a small operating-system kernel took about twenty-five person-years of effort to verify eight thousand seven hundred lines of code — roughly twenty-three lines of proof for every line of actual code. For most software, that math has never been worth it.

So why the reversal? Two forces meeting in the middle. On one side, the cost of formal methods is dropping, because AI tools are making these notoriously difficult techniques usable by more people — though he is careful to note the AI still needs plenty of human guidance for hard proofs. On the other, the value is rising sharply: AI agents now write code fast, but that code tends toward what he bluntly calls "slop" — overcomplicated, buggy, ignoring the codebase's rules. Someone has to verify all that machine-generated code, and verification has become the bottleneck. Formal methods attack exactly that bottleneck, and they give the agents something they thrive on — precise feedback about whether their code is actually correct.

He offers one clean analogy. A type system — the everyday feature that stops you adding a number to a piece of text — is itself a lightweight formal method. Its power is what he calls universal guarantees: ban a category of bug and you eliminate every instance of it, everywhere, forever. Testing can never do that, because tests only check the cases you thought to try. Testing is spot-checking a few boxes coming off the assembly line; formal methods is proving the machine itself can never produce a bad box. A famously pragmatic, sceptical firm publicly deciding this is now worth it is a real signal about where AI is pushing serious engineering.

Slack bets on infrastructure strategy

The second story is the unglamorous reality behind the AI features you actually use. Slack's post traces about three years of evolution in the infrastructure behind Slack AI — the summaries, the search, the recaps — across four phases, ending in a multi-cloud setup. The honest theme: the hard part of AI features is almost never the model. It's serving the model reliably, affordably, and without leaking anyone's data.

The journey goes like this. Phase one: self-managed model serving on one cloud, with a clever privacy arrangement where Slack's data stayed private and the vendor's model stayed hidden from Slack — but it scaled slowly, fought constant GPU scarcity, and lagged on getting the newest models. Phase two: a managed model service for faster access and less plumbing, with what they describe as a zero-incident migration using load tests, quality-parity checks, feature flags and instant rollback — but now they were paying for reserved capacity sitting idle at off-peak hours, and locked into commitments that slowed upgrades. Phase three: a hybrid approach — keep latency-sensitive features on dedicated capacity, push bursty overnight work onto on-demand capacity, with a spillover pattern so traffic surges overflow gracefully instead of failing. Phase four, by early 2026: full multi-cloud, adding a second cloud provider alongside the first, with an intelligent routing layer that picks the right model for each job and an automated circuit breaker.

That circuit breaker reframes what reliability even means for AI. It watches for signs a model is in distress — responses lagging, errors creeping up, latency crossing a threshold — and automatically reroutes traffic away, then carefully eases it back once things recover. As the post puts it, an AI service that is technically "up" but painfully slow is, for the user, effectively broken.

An AI service that is technically "up" but painfully slow is, for the user, effectively broken.

The payoff: by matching the right model to each feature, they report something like a ten per cent quality improvement on complex reasoning tasks and around a sixty-seven per cent latency reduction for the fast, lightweight ones. Think of it like an electricity grid drawing from several power stations at once — if one plant fails or its prices spike, the grid instantly pulls from another, and you never see the lights flicker.

The Guardian bets on culture

The third story is a deliberate change of pace — lighter, more human, and a reminder that engineering orgs don't only improve through technology. The Guardian ran its first global, company-wide hack day, named after the writer and critic John Berger, whose ideas about how we see the world framed the theme: become more global, more digital, more visual. For two days, people from across the organisation and its international offices — editorial, product, data, engineering, commercial — teamed up to build rough prototypes tackling real challenges like accessibility, trust in journalism and reader overload. No prior hack-day or engineering experience required; the stated entry requirements were just curiosity, generosity, and a willingness to try things.

The winning projects are the fun part. One, called "Facts are Sacred," made the sources in podcasts and videos visible with timestamped, linked references — a transparency play. Another, the "most entertaining" winner, turned news articles into interactive snooker tables and curling sheets, which is gloriously silly. But the most telling award went to the "most ambitious failure" — a big, brave attempt that didn't work, recognised precisely so the team could share what they learned openly.

This is, honestly, the lightest of the three: a feel-good culture story, not a deep technical write-up. But it's here on purpose, as the human counterpoint. Giving a prize to a failure says something real about how good teams improve — they carve out time, trust and permission to experiment, and treat a smart failure as something to celebrate and learn from, not bury. It's a grown-up science fair: a fixed window where anyone can build a wild idea and show it off.

The threads

Pull the three together. Jane Street bets on provable correctness — using formal methods to keep quality high in a world where AI writes the first draft. Slack bets on infrastructure strategy — a flexible, multi-cloud, multi-model platform so its AI features stay reliable and affordable. The Guardian bets on culture — a hack day that rewards curiosity and even rewards failure. Correctness, infrastructure and culture: three different levers, the same underlying move. None of these is a flashy new model or a viral demo. They're the deeper, slower investments — in rigour, in resilience, in people — that separate an organisation which ships impressive demos from one that keeps shipping good software, year after year. And in an era where the demos are getting cheap, those deeper investments are exactly where the lasting advantage lives.