Asking versus enforcing: how to actually control an AI agent
An AI agent follows a written rule most of the time — but "most of the time" is exactly where reliability dies. The craft is knowing which rules to prompt and which to enforce in code.
Hand a real task to an AI agent and a quiet question surfaces almost immediately: how do you make it follow your rules — use your build commands, honour your conventions, never touch certain files? There turn out to be seven mechanisms for delivering those instructions, and a way of thinking about which one to reach for when.
Three ideas everything hangs on
Claude Code is an AI coding assistant that runs in your terminal: you give it tasks in plain language, and it reads files, writes code, and runs commands much like a developer would. Three concepts decide whether it behaves the way your team wants.
The first is the context window — the model's working memory, a finite space holding everything it is currently attending to: your instructions, the files in view, the conversation so far. Everything you load competes for room, and the more you stuff in, the less reliably the model attends to any one piece. Instructions are not free; every rule costs space and a little focus.
The second is compaction. In a long session the window fills, so the system summarises and compresses what has happened to make room. Some instructions survive this and some don't — and a rule that quietly vanishes halfway through is worse than useless: you think it is in force, and it isn't.
The third is the system prompt — the foundational instructions that define the assistant's role before you say a word. Instructions here carry the most weight, but meddling with this layer is the riskiest move of the seven.
For any instruction, then, ask three questions: when does it load into context, does it survive compaction, and how much authority does it carry? Every method below is just a different answer.
A catalogue of seven, grouped by job
Start with facts that should always be present. That is the job of the CLAUDE.md file — plain markdown at the root of your project that loads at the start of every session and stays for it: build commands, directory layout, conventions, team norms, the overview a new colleague would want on day one. It survives compaction, re-read when the conversation compresses. The failure mode is human: the file grows like any config nobody owns, every team appending lines, nothing deleted. Hence the advice — keep it under two hundred lines, give it an owner, and review changes to it like code.
Next come rules — markdown files of specific constraints whose key feature is scoping. A rule carries file-path patterns and loads only when Claude works on matching files, so "all API handlers must validate their input" can be scoped to your API directory and cost nothing in a documentation-only session. The sharp point: an unscoped rule is mechanically identical to putting that text in CLAUDE.md — always loaded, always costing tokens.
Skills handle procedures rather than facts. A skill is a folder of instructions, sometimes with scripts and resources, but only its name and a short description load up front; the full step-by-step body loads only when the skill is invoked. So you can keep dozens of detailed playbooks available while paying almost nothing for the ones you aren't using.
Subagents are the most architecturally interesting. A subagent is a separate, specialised assistant that runs in its own fresh context window: its detailed instructions and messy intermediate work never enter your main conversation, and only the final answer comes back. A deep search through hundreds of files would otherwise flood your context with results you will never look at again; hand it to a subagent and the clutter stays in a separate room. These can nest up to five levels deep, and dynamic workflows can orchestrate tens or even hundreds of background agents, the coordination living in program variables rather than the model's context. Reach for a subagent when you don't want to watch the intermediate steps, a skill when you do.
The method that isn't an instruction
Then there are hooks, different in kind from everything above. Every method so far has given the model words to read and, hopefully, follow; a hook is not an instruction. It is a piece of code, a command, or an endpoint that the harness — the surrounding software — runs automatically when a specific event happens, like a file being edited. The model doesn't decide whether to honour a hook; the harness just runs it.
Telling Claude "always run the formatter after editing a file" is a request; configuring a hook that runs it on every edit is a guarantee.
Hooks barely cost context, because they live outside the conversation as code the harness executes, not as text the model holds in mind.
The last two methods operate at the system prompt — the most powerful and most dangerous layer. Output styles inject instructions directly into it, carrying the highest weight and never getting compacted away — but by default a custom style replaces the standard instructions that tell Claude Code it is a software-engineering assistant, and swap those out carelessly and your expert assistant quietly reverts to a generic chatbot. The seventh method, appending to the system prompt, is the safer cousin: an append flag adds to the foundational instructions for a single session rather than replacing them. The honest caveat: the more you pile on, the less strictly each instruction gets followed, especially if any quietly contradict.
The line that generalises
The catalogue is only half of it; the better half is the decision framework. If you ever find yourself writing "every time X, always do Y," or worse, "never do this," stop: an instruction is the wrong tool. Claude will follow a written rule most of the time, but under pressure — a long session, an ambiguous moment, or a malicious instruction slipped into a file it is reading — it can fail. A real guardrail has to be deterministic, and the deterministic methods are hooks and permissions: a hook can block an action before it happens, and managed settings deployed by an administrator can't be overridden by a user's local config — the only way to enforce a rule across an entire organisation.
Why it matters
Be fair: this is a post by a company about its own product, so it is also, gently, marketing, and the seven mechanisms are particular to Claude Code. But strip away the specifics and what remains is genuinely transferable, resting on two ideas. Context is a budget — every instruction costs space and focus, so place it to load only when it earns its keep. And there is a difference between asking and enforcing: a model will usually comply, but "usually" is not "always," and that gap is where reliability lives. Probabilistic instructions for guidance, deterministic guardrails for what must never go wrong — get that right, and the rest is just good housekeeping.