Teaching AI to Write Like Me
TL;DR: I built a Claude Code skill that writes RFCs matching my voice and structure. It took a week of iteration to get the “container” right (tone, structure, quality bar) and another week to produce a real RFC with it. The payoff: when a new topic came up at a workshop, the skill compressed what used to take over a month into a single day. The skill is open source.
The Trust Problem
In the first article I talked about how writing RFCs was my thinking tool, and how AI let me shift that thinking to brainstorming and editing instead. This article is about the other half of that equation: making sure the output doesn’t undermine itself.
An RFC is the tool for getting what’s in my head into other people’s heads. It should help that process, not add friction. An RFC that feels too AI erodes trust in the content itself, because readers start wondering whether the author actually did the thinking or just deferred it. AI-sounding text isn’t just an aesthetic problem, it’s a credibility problem. That concern shaped everything about how I built the skill.
Building the Skill on My Own Time
I started this project on my own time, burning my own API tokens, because I wanted to explore what was achievable without any constraints or expectations. No Zendesk IP involved, just curiosity about how far I could push Claude Code’s skill system for a specific, repeatable task.
Claude Code skills are folders of markdown instructions that tell Claude how to perform a task in a specific way. You define the rules, the structure, the quality criteria, and Claude follows them when the skill is triggered.
The approach I took was to separate two concerns. First, get the “container” right: the structure of the RFC, the tone, the formatting rules, the quality bar. Second, pour real content into it. I could evaluate the container without caring whether the content was accurate, because I was reading for voice and shape, not technical correctness.
I had a real implementation plan from a previous personal project as input, so the generated output was coherent enough to evaluate meaningfully. I just wasn’t reading it to check if the architecture was right. I was reading it to check if it sounded like something I’d want to read.
Ten Iterations to Get the Voice Right
It took ten attempts to get the skill to a point where I’d put my name on the output.
Here’s the kind of thing the early iterations produced. Imagine an RFC section that reads like this:
The Latency Paradox
Streaming is complex. It requires careful orchestration. Every millisecond matters. The buffer must be authoritative. Pub/sub is an optimization. Failure modes are subtle. Recovery must be seamless.
It’s worth noting that this approach fundamentally transforms how we think about message delivery.
You know it when you see it. The punchy staccato rhythm, the dramatic header that could be a TED talk title, the “it’s worth noting” filler. Every sentence sounds confident in isolation, but stack them together and it reads like a language model performing authority instead of explaining a system.
That was the biggest battle across ten iterations: getting the skill to connect ideas with actual reasoning instead of firing declarative sentences one after another. Over-the-top formulations and vague hedging were the other recurring offenders, but the staccato pattern was the one that kept coming back.
Each iteration followed the same loop. Generate an RFC from the implementation plan, read it, identify what felt wrong, feed corrections back into the skill definition, and regenerate. The loop was tight because I was just pointing Claude Code at the skill folder and telling it what to fix. Claude would update the skill’s own markdown files based on my feedback, and I’d regenerate to see if the output improved. The corrections accumulated into a detailed style guide and a set of anti-patterns that the skill enforces on every generation.
One problem I didn’t anticipate was content leak. I’d included a fake example RFC in the skill’s reference material so Claude could see the target structure and tone. The example was fictional, but the type of project it described was too close to the domain of the RFC I was generating. Content from the example kept bleeding into the output: architectural patterns, terminology, even specific technical decisions from the example would show up where they didn’t belong.
The fix was deliberate: I replaced the example with a completely unrelated domain (a fictional transit routing system). If transit planning terminology ever shows up in a messaging infrastructure RFC, it’s immediately obvious something went wrong. The content leak problem disappeared.
What the Skill Actually Does
The skill isn’t a single prompt. It’s a pipeline of four stages covering the RFC lifecycle: brainstorming (structured interview to map the problem space), writing (the core, with a writer and reviewer subagent that iterate until quality passes), feedback incorporation (feeding my editorial comments back into the skill to drive revisions), and finalization (mechanical cleanup before circulation).
The write/review loop is where most of the complexity lives. The writer follows a detailed style guide, and the reviewer checks for structural issues, AI-obvious patterns, undefined references, and content leaks from the example RFC. They loop until no major or moderate issues remain, up to three iterations.
The full skill set is available on GitHub, including the style guide, example RFC, reviewer prompt, and agent definitions. You can install it directly from the Claude Code plugin marketplace and use it as a starting point, or fork it and adapt the style guide to your own voice. That’s the part that matters most.
From Container to Content
Once the skill was producing output I’d want to read, I brought it into work to test against a real project. The message streaming RFC for Zendesk’s messaging platform was the test case, and the implementation plan I’d already built with Claude became the source material.
What surprised me about the content phase was how different the failure modes were. The container phase was about catching AI voice patterns, things that were obviously wrong in tone. The content phase surfaced subtler problems: sections where the skill had made a reasonable-sounding architectural claim that didn’t actually hold up, or where it had filled in a gap with plausible but invented rationale instead of flagging it for my input. The skill’s review loop caught structural and style issues reliably, but technical accuracy still required me to read every section as if I were reviewing someone else’s RFC.
The content phase took about a week, and that included building a first version of an inline annotation tool to make the review cycle less painful. (That tool ended up being its own story, which I’ll cover in an upcoming article.) Skill development (the container) took about a week before that.
The Payoff
The skill proved itself during a workshop on the streaming project, which I described in the first article. A new sub-topic needed its own RFC, and with the skill already dialed in, the whole thing took a day instead of the usual timeline.
What made that possible wasn’t speed. It was that the skill had already absorbed all the lessons from ten iterations of container work: the style guide, the anti-patterns, the review loop, the content leak fix. I didn’t have to think about any of that. I just had to think about the actual problem.
The other thing I didn’t expect was the “too detailed” feedback from reviewers. AI thoroughness can overshoot what humans expect in a document. When the cost of adding a section drops to nearly zero, you end up including things you would have cut if writing by hand. Learning when to tell the skill to go lighter is its own calibration, and one I’m still working on.
If You Want to Try This
The skill is built for Claude Code but the principles apply to any AI writing workflow. The important parts: write a style guide that captures your anti-patterns (not just what you want, but specifically what you don’t want), include an example document from a completely unrelated domain, and build a review loop that checks the output against your style guide before you ever see it.
When I built this skill, the iteration loop was entirely manual: generate, read, give feedback, update, repeat. Anthropic has since released a skill-creator that includes an eval framework, benchmark mode, and blind A/B testing between skill versions. It can generate test cases, run them in parallel, and tell you whether a change actually improved things or just felt like it did. My plan is to run the RFC skill through it to generate evals I can use to benchmark future modifications, so that when I tweak the style guide or update the review criteria, I have actual data on whether the output got better or worse instead of relying on my gut.
That’s the direction this is heading: treating skill development the way we treat software development, with tests, baselines, and measurable improvement over time.