In the first post of this series I wrote about watching my kids use learning apps and deciding to build a learning engine myself. This post is about the first real architecture decision, the one every later decision has to live inside. Where does the AI sit?
The default answer in early 2025 is everywhere. Put a model behind a chat box, hand it the curriculum and the child, and let the conversation be the product. That design is fast to ship and demos beautifully, and I believe it is the wrong shape for an unsupervised young learner. A language model is confidently wrong some percentage of the time, and I have yet to hear that percentage stated with a straight face. Whatever it is, it reaches the child directly, live, with no one in between.
I could not accept that, so I drew a line through the middle of the product instead.
Generation on one side, serving on the other
The engine I am building has two halves that never run at the same moment for the same child.
The first half is a content factory. It runs in the background with no learner anywhere near it. Models generate math problems from concept configurations I author, each one a declarative description of what a concept is, what makes a problem easy or hard, what a valid answer looks like. The factory compiles raw model output into structured rows, runs them through programmatic checks, and queues what survives for review. Output that fails is discarded. Nothing in this half is precious and nothing in it is urgent, because no child is waiting on it.
The second half is the runner, the part a learner actually touches. It is a small, deterministic web client that serves problems from a pool of approved content. It contains no model. It cannot hallucinate, because nothing in it generates anything. Every problem it shows has already been checked by machine and seen by a person. When a learner answers, the runner knows the canonical answer because the factory stored it as data.
That diagram carries most of the argument. The model and the learner are never in the same room. Anything the model produces passes through checks and a person before it can exist in the learner's world. The boundary is structural, so the safety claim does not depend on a prompt behaving, a temperature setting, or a model update landing well. The learner is safe from hallucination the way a reader of a printed book is safe from typos that were caught before printing.
What the boundary costs
I want to be honest about the price, because there is one and the chatbot products are not paying it.
The learner cannot ask my system anything. There is no open conversation, no "explain it again but with dinosaurs," none of the genuinely magical flexibility that makes the general models so appealing. The experience is bounded by what the factory produced ahead of time. If that bound is drawn too tight, the product becomes a worksheet generator with extra steps, and I will have built the rigidity I was complaining about in the first post.
Freshness costs too. A chatbot adapts mid-conversation. My loop adapts between sessions, when the factory restocks the pools against what a learner needs next. I suspect that for structured subjects like early math this delay costs little, the concepts change slowly even when the child changes quickly. I also accept that I might be wrong about that, and if I am, this whole architecture inherits the problem. It is the bet of the project.
What the boundary buys
Beyond the safety argument, two things fell out of this design that I did not fully appreciate until they were running.
Serving is instant. There is no model in the path when a child taps an answer, so there is nothing to wait on. Anyone who has watched a seven-year-old lose interest during a three-second spinner understands what this is worth. The slowest part of my serving path is the network, and the client caches enough locally that even that fades.
The privacy story follows from the same boundary. The factory does not need to know any child exists. When it calls a model it sends an abstract problem template and pedagogical metadata, never a name, never an account, never a history. Models never process or store anything personal, which is not a policy I enforce with paperwork but a structure that has no path for the data to take. Account data lives entirely on the serving side. For a product aimed at children I suspect this will matter more every year, and it costs me nothing extra because the boundary was already there for safety reasons.
A person in the chain
Right now, every problem that survives the programmatic checks is reviewed by me before it enters the approved pool. A model generates, the checks filter, and a human decides. That is the chain, and for the foreseeable future I intend to keep a person in it.
I can already feel where this will strain. Reviewing everything is fine at the volume one parent authoring one subject produces. It will not survive the volumes a real catalog needs, and at some point I will have to decide what the checks can clear on their own and what always deserves eyes. I do not have that answer yet, and I would distrust anyone who claimed it on day one. The honest version is that the review queue is teaching me, every session in it, what machine checks miss. When I draw that line I want to draw it from data, and when I do, I will write it down here.
That is the architecture at its simplest. AI generates, checks and people verify, learners only ever see what has already been cleared. Everything else I build, and there is a great deal left to build, sits on one side of that line or the other.