There is a version of this essay where I pretend the pipeline worked perfectly the first time.
That version is a lie.
The actual starting point
The first agent I built was a research assistant that was supposed to summarise five newsletters into a single brief. It worked fine for three weeks and then started hallucinating sources. I didn't notice for two issues.
That's the thing nobody tells you about agentic workflows: the failure mode is not a crash. It's a plausible-sounding wrong answer.
What the pipeline looks like now
Four tools, in order:
- Research agent — given a topic, pulls the last 30 days of relevant material from a set of RSS feeds and my reading list. Outputs a structured brief: five themes, ten quotes, three angles.
- Drafting agent — takes the brief and produces a first draft. Prompt-engineered to write in short paragraphs, avoid listicles, and flag any claim it's unsure about.
- Editing pass — a second model reads the draft and returns a list of cuts, a flag for any factual claims, and a suggested opening line.
- Formatter — strips the markdown, applies the newsletter template, outputs a ready-to-paste block.
My job: read the output once, adjust the opening, rewrite any sentence that sounds like a language model wrote it, and hit send.
The prompts I wish I'd written first
The most important lesson: the output quality is almost entirely determined by the system prompt for the drafting agent.
The thing that made the biggest difference was a single constraint: write as if for one specific reader, not a general audience. That one line changed the register of every draft.