Giving coding agents rails instead of trust

A large share of my learning engine is now written by coding agents. I type far less than I did a year ago, and the platform is better tested than the code I used to write by hand. Both of those things are true, and the second one is not because of the agents. It is because of what the agents are not allowed to do.

This year the discourse around agentic coding swings between two stories. In one, the agent is a tireless senior engineer and you should hand it the repository and get out of the way. In the other, it is a confident intern flooding your codebase with plausible slop. My experience building this platform says both stories describe the same tool, and which one you get depends on the rails you build around it, almost more than on the model inside it.

This post is about the rails I ended up with, after some months of letting agents build a system whose entire reason to exist is that children can trust what it serves.

Why blind agents were never an option here

The architecture of this engine is a set of walls. Generation stays away from serving, the database schemas isolate each stage of a problem's life, and a learner only ever touches approved content. Those walls are load-bearing. An agent that wanders the codebase freely, edits what it likes, and pokes the database to see what happens is a wrecking ball with good intentions, and I would not know which wall it weakened until something reached the review queue that should never have been generated, or worse, reached a learner.

The naive fix is to write a long instructions file begging the agent to be careful. I tried that. Begging does not scale, and an agent under context pressure forgets politely worded constraints exactly when it matters. What worked was the same thing that works with people. Change the environment so the safe path is the easy path, and make the dangerous paths fail loudly before damage lands.

Let the agent inspect without letting it touch

The first rail is a small read-only service built just for agents. It answers the questions an agent keeps needing to ask. What concepts exist, what are their prerequisites, what does this renderer expect as input, is the curriculum healthy right now. The agent queries that service the way a developer reads a dashboard, and it gets structured answers without running the app, without opening a database connection, without any ability to mutate what it is looking at.

Before this service existed, an agent that needed curriculum context would improvise. It would write a quick script against the database, or spin up parts of the app, and every improvisation was a fresh chance to do something I had not anticipated. Giving it a sanctioned window removed the reason to improvise. The lesson surprised me with how human it is. Most dangerous agent behavior I saw was not malice or stupidity, it was a capable system inventing a workaround because the legitimate path did not exist.

Make the agent verify like a person

The second rail goes the other direction. After an agent changes pipeline behavior, I want proof the change works end to end, and the only proof I trust is the same one I would produce myself. So agents here verify through a browser automation layer that drives the real admin tools. Generate a scenario, watch it pass the gates, see it land in the review queue, confirm what it looks like there.

The temptation, and most agents try it on their first attempt, is to verify with a unit test against an internal function and call the job done. That proves a function returns a value. It does not prove the pipeline did the right thing for the person operating it. Forcing verification through the same interface a human uses closes the gap between what was tested and what is true, and it has caught issues a test suite alone would have happily ignored, an admin screen that lied about queue state while the internals were technically correct, for one.

Write the procedures down

The third rail is the dullest and might matter most. The recurring jobs in this project, adding a concept, wiring a renderer, extending a validation gate, each have a written procedure the agent is pointed at, with the steps, the checks, and the definition of done. Agents follow a good procedure remarkably well, and a procedure is reusable across sessions in a way a clever one-off prompt never is. When an agent goes off the rails now, my first question is no longer what the model got wrong. It is which procedure was missing or stale, and the fix improves every future run, whichever model is driving.

Let the CI gates be the last word

The final rail assumes everything above fails. Continuous integration runs the full suite on every change, and alongside the ordinary tests sit gates written specifically with agents in mind. One enforces where tests are allowed to reach, so a test that quietly crosses a schema boundary fails on the spot, even if it passes. The walls in the architecture get proven on every commit, because the gate distrusts the change, whoever or whatever authored it.

test reaches across a schema boundary
  -> gate fails the build, even when the test passes
agent edits a renderer contract without updating its consumers
  -> contract check fails before merge

A gate like that is not really an agent control. It is an honesty control, and it disciplines me as much as the machines.

What I think this generalizes to

My current read is that agents amplify whatever engineering culture they land in. A codebase with real boundaries, sanctioned ways to look around, and gates that fail loudly turns agents into a genuine multiplier. A codebase held together by tribal memory and hope gets its weaknesses multiplied at machine speed. In my experience the model has mattered less than the rails around it, though that may say as much about my codebase as about the models.

I hold that view with some humility, because the tools are improving fast and the agents of next year might need fewer rails. So far the opposite has held for me, though I am holding that loosely. The better the agents get, the more work I give them, and the more work I give them, the more the rails earn. Speed without verification is just a faster way to be wrong, and this platform exists for users who cannot absorb my mistakes. The rails are not overhead on the real work. For a product like this one, they are the real work.

Giving coding agents rails instead of trust

Why blind agents were never an option here

Let the agent inspect without letting it touch

Make the agent verify like a person

Write the procedures down

Let the CI gates be the last word

What I think this generalizes to

Looking at edtech as a parent before a designer

Separating generation from serving in a learning product

Why blind agents were never an option here

Let the agent inspect without letting it touch

Make the agent verify like a person

Write the procedures down

Let the CI gates be the last word

What I think this generalizes to

Enjoyed the read?