Article2026-07-05

Why Adding More Rules to CLAUDE.md and AGENTS.md Makes Your Agent Follow Them Worse, and How to Fix It

Bora Lee

Founder, Modern Web Labs

A frontier model can follow roughly 150 to 200 instructions with reasonable consistency. Once CLAUDE.md and AGENTS.md exceed that budget, compliance quality degrades evenly across every instruction. This article covers how to design rule files within the limit.

If you maintain a CLAUDE.md or AGENTS.md, you have probably lived through this paradox: the more rules you add, the less reliably the agent follows them. The problem is not a lack of rules. As instructions pile up, the model's ability to follow any instruction degrades. This article introduces the 'instruction budget', a concept that quantifies that limit, and covers how to design CLAUDE.md and AGENTS.md within it.

How a rules file turns into a ball of mud

Instruction files bloat the same way on every team. The agent does something you dislike, so you add a rule. Repeat that for a few months and the file becomes a several-hundred-line 'ball of mud'.¹ When multiple developers add contradictory opinions and nobody ever prunes the whole thing, the rules file stops being an asset that helps the agent and turns into a liability that drags performance down.

Many people expect that more rules will make the agent behave more the way they want. It does not work that way. There is a limit to the total number of instructions a model can follow.

How many instructions can a model actually follow

A paper published in July 2025, "How Many Instructions Can LLMs Follow at Once?", quantified this question with a benchmark called IFScale.² The authors built a report-writing task, scaled keyword-inclusion instructions from 10 up to 500, and evaluated 20 models from 7 major vendors. The result was clear. At 500 instructions, even the top frontier models dropped to 68% accuracy.

Kyle at HumanLayer translated this result into practical guidance: a reasoning frontier model can follow roughly 150 to 200 instructions with reasonable consistency.³ Smaller models and non-reasoning models handle fewer.

Two important implications follow.

First, the budget starts partially spent. Claude Code's system prompt alone contains around 50 discrete instructions.³ Tool definitions, MCP server configurations, and plugins each add their own on top. The budget actually left for your CLAUDE.md is closer to 100 to 150.

Second, exceeding the budget does not mean one specific rule gets ignored. The degradation the paper observed is uniform. As the instruction count grows, compliance quality drops evenly across all instructions. It is not just the newly added 200th rule that gets ignored; the 10th rule that used to work reliably starts to wobble too. This is why adding a rule can introduce mistakes you never saw before. The paper also confirmed a primacy bias, where instructions placed earlier are followed better, so where you put an instruction is a design decision as well.

Designing within the budget

Keep the root file minimal

Every token in CLAUDE.md and AGENTS.md is loaded on every request, relevant or not. So the bar for what stays in the root file is 'does every single task need this'. Fewer items pass that bar than you might think.

A one-sentence project description
The package manager (if it is not npm)
Build and typecheck commands (if they are non-standard)

What to remove is equally clear. Instructions like "write clean code" or "handle edge cases" describe things the model already knows and only waste budget. Detailed code-style rules and instructions that only matter for specific tasks have no reason to live in the root file either.

Document domain concepts, not file paths

The temptation to document your directory structure is strong, but it is risky. Unless the documentation is maintained with real discipline, it falls behind the code quickly, and stale information only pollutes the context.¹ After a path changes, the agent keeps trusting the document and goes looking in the wrong place. It is safer to describe capabilities instead of structure and let the agent discover concrete locations on its own. Domain concepts like 'how organizations relate to workspaces' are far more stable than file paths, which makes them worth documenting. In an organization, communication often breaks down because people interpret the same term differently. Code and agents are no different.

Save budget with progressive disclosure

What you remove from the root file is not thrown away. It moves somewhere it will be loaded only when needed. This technique is called progressive disclosure. Instead of keeping 20 TypeScript conventions in the root, move them to docs/TYPESCRIPT.md and leave a single reference line in the root.

For TypeScript conventions, see docs/TYPESCRIPT.md

With an instruction like the one above in CLAUDE.md or AGENTS.md, the conventions load only when the agent works on TypeScript, and every other task spends no instruction budget on them. References can be nested (docs/TYPESCRIPT.md pointing to docs/TESTING.md, for example) and can link out to external documents. Agents are good at navigating a document hierarchy, so leaving a trail is enough for them to find what they need.

Split by scope in a monorepo

AGENTS.md is not a root-only file. You can place one in subdirectories as well. Your coding agent sees the merged content based on where it is working. Put the monorepo's purpose and shared tooling at the root, and each package's purpose, stack, and rules inside that package. Even when instructions are spread out, the agent merges and follows them according to the task at hand. Keep in mind that the merged whole enters the context, so the budget math stays the same. Whether an instruction lives at the root or in a subdirectory, it must not bloat.

Fixing a file that is already bloated

You do not have to clean up a several-hundred-line rules file by hand. Delegate the refactoring itself to the agent. The prompt below comes from the aihero.dev guide.¹

I want you to refactor my AGENTS.md file to follow progressive disclosure principles.

Follow these steps:

1. **Find contradictions**: Identify any instructions that conflict with each other. For each contradiction, ask me which version I want to keep.

2. **Identify the essentials**: Extract only what belongs in the root AGENTS.md:
   - One-sentence project description
   - Package manager (if not npm)
   - Non-standard build/typecheck commands
   - Anything truly relevant to every single task

3. **Group the rest**: Organize remaining instructions into logical categories (e.g., TypeScript conventions, testing patterns, API design, Git workflow). For each group, create a separate markdown file.

4. **Create the file structure**: Output:
   - A minimal root AGENTS.md with markdown links to the separate files
   - Each separate file with its relevant instructions
   - A suggested docs/ folder structure

5. **Flag for deletion**: Identify any instructions that are:
   - Redundant (the agent already knows this)
   - Too vague to be actionable
   - Overly obvious (like "write clean code")

One more note: avoid generating CLAUDE.md or AGENTS.md automatically with an init command (/init) or a script. Auto-generation favors comprehensiveness over restraint, so it produces a file that starts out over budget.

For what this kind of cleanup delivers in a product at scale, the Lovable case is a good illustration. A separate article covers how the team overhauled a bloated system prompt and improved response speed and cost at the same time.

Rules that must hold 100% of the time belong in hooks

Staying within the instruction budget raises compliance. It does not make compliance 100%. The official Claude Code documentation acknowledges this limit directly: CLAUDE.md is context the model consults, not enforced configuration, and "there's no guarantee of strict compliance".⁴ However carefully you polish a rule, whether to follow it remains the model's judgment call.

That is why the official docs recommend moving rules that must always hold out of instructions and into hooks. A hook is a shell command that runs at a fixed point in the lifecycle, such as right before a commit or right after a file edit.⁵ Instructions are something the model chooses to follow; hooks run regardless of what the model decides. Rules like running tests before every commit, formatting after every save, or blocking dangerous commands belong in this category. Moving them into hooks saves instruction budget and pushes their compliance to 100%.

So the next step after trimming your rules file is learning hooks. The Hooks chapters of our Claude Code tutorial (currently available in Korean) walk through the concept, how to define and implement a hook, and a collection of hooks that prove useful in practice.

Wrapping up

The instruction budget is not a matter of taste. It is a measured limit: around 150 to 200 instructions for frontier models, and effectively 100 to 150 once the system prompt takes its share. Past that point, compliance quality degrades evenly across every instruction, so trying to fix problems by adding rules ends up creating new problems.

The ideal rules file is small, focused, and points elsewhere. Include just enough for the agent to start working, and leave the details to progressive disclosure. Rules that must never be broken belong in hooks, not instructions. Trimming your rules file is not giving up agent performance. It is how you get it back.

Matt Pocock, A Complete Guide To AGENTS.md, AI Hero.
↩
Daniel Jaroslawicz et al., How Many Instructions Can LLMs Follow at Once?, arXiv:2507.11538, 2025.
↩
Kyle (@0xblacklight), Writing a good CLAUDE.md, HumanLayer Blog, 2025.
↩
Anthropic, How Claude remembers your project, Claude Code Docs.
↩
Anthropic, Automate actions with hooks, Claude Code Docs.
↩

Newsletter

Notes vetted by enterprise practitioners, every two weeks.

Notes on Claude Code, GitHub Copilot, AI-native engineering strategy, and adoption case studies, curated every two weeks.

Modern Web Labs · Consulting

You read it. Now bring it into your team.

If the patterns in this post fit your situation, start with a short conversation about how to apply them.

How we can help

AI-Native Strategy
Redesign operating standards, measurement, and governance
Claude Code · GitHub Copilot
Two-day hands-on plus AI-graded in-house certification
Web Platform
Building full-stack services on Next.js

Start a Conversation