Skip to main content

Command Palette

Search for a command to run...

Stop Chatting with AI. Start Loops (Ralph Driven Development)

Updated
4 min read

Typing is a high-friction activity that forces you to filter out "secondary" context. That lost context is usually what causes agent hallucinations.

My stack focuses on high-speed context dumping and stateless execution loops.

The Stack

  • handy.computer: Offline, local Whisper Turbo. I dictate complex architecture, or even loose conflicting thoughts at speech speed. This removes the barrier to providing the "half-useful" details that actually ground the architecture. Agent planning will question this later.

  • opencode.ai: The terminal agent runner.

  • CLI > MCP: I overwhelmingly prefer standard CLI tools over MCP. LLMs are trained on the internet. They know the CLI as model parameters. Agents can run --help. They don't know your custom MCP schema. Save the context window.

    • The Exception: playwriter. It uses standard Playwright syntax and attaches to your real browser to reuse auth/session state.

1. Ralph Driven Development

Credit to Geoffrey Huntley for the methodology.

The Core Philosophy: Erecting Signs Bad AI results are usually bad prompts, bad context, or bad data access. If the agent fails, do not just fix the code. Fix the prompt. If "Ralph" falls off the slide, you don't just put him back; you put up a sign that says "DON'T JUMP."

  • Implementation: I use AGENTS.md (which OpenCode reads by default) to store these "signs." If the agent figures out a tricky build step, I instruct it to write that knowledge to AGENTS.md so the next loop doesn't have to rediscover fire.

Plans vs. Specs Huntley advocates for separating requirements into specs/*.md files.

  • My take: Currently, I find a massive, well-structured plan.md sufficient for most tasks, though splitting specs is a valid optimization I am exploring.

While I use OpenCode - of course this works with Claude Code or any other harness.

2. The Planning Phase

I spend an hour purely on the plan. I do not touch code. I dictate high-level constraints and nitpick API surfaces until I have a plan.md that often reaches 1,000+ lines. I read every line, if I am not happy I keep planning.

The Pivot: Once the plan is solid, change to the build agent and send this prompt to crystallize the state:

"I love the plan. Please write it to plan.md in chronological order as a backlog with checkboxes. Each task should be small and isolated. Feel free to create a large backlog so it is specific enough for a new engineer to take over implementation immediately."

Most engineers would change to Build agent and send ‘go’.

3. The Execution Loop

Once plan.md is frozen, I run a headless loop. This forces the agent to re-read the full context every iteration, eliminating context drift. This ensures the agent has a vague understanding of the past AND the end state.

Bash (macOS/Linux):

while :; do opencode run -m "opencode/claude-opus-4-5" "READ all of plan.md. Pick ONE task. Verify via web/code search. Complete task, verify via CLI/Test output. Commit change. ONLY do one task. Update plan.md. If you learn a critical operational detail (e.g. how to build), update AGENTS.md. If all tasks done, sleep 5s and exit. NEVER GIT PUSH. ONLY COMMIT."; done

PowerShell (Windows):

while ($true) { opencode run -m "opencode/claude-opus-4-5" "READ all of plan.md. Pick ONE task. Verify via web/code search. Complete task, verify via CLI/Test output. Commit change. ONLY do one task. Update plan.md. If you learn a critical operational detail (e.g. how to build), update AGENTS.md. If all tasks done, sleep 5s and exit. NEVER GIT PUSH. ONLY COMMIT." }

4. Verification & Backpressure

The compiler is not just a runner; it is a filter. You need strict backpressure to reject hallucinations before they are committed.

  • CLI Verification: Design codebases to be verifiable via args (e.g., dotnet run -- analyze).

  • Strictness: In .NET, I enforce <TreatWarningsAsErrors>true</TreatWarningsAsErrors> and use the latest Roslyn analyzers in a root Directory.Build.props. If it warns, the loop fails, and Ralph tries again.

5. Brownfield vs. Greenfield

Huntley argues this is for Greenfield only. I disagree.

Because I spend significant time planning and never allow the agent to git push (only commit), I maintain a "Human in the Loop" review process. I manually review, apply taste, and test the final state before raising a PR. This makes the technique viable for Brownfield/Legacy codebases, provided you trust your review process more than the agent.

Future Roadmap

I am looking into community plugins for OpenCode to integrate a native "Ralph Agent."

  • Parallelism: Currently, the loop is serial. Future harnesses should identify which tasks in plan.md can be parallelized (fan-out) and when they must converge for a blocking build step (fan-in).
7.5K views

More from this blog

L

Luke Parker (Hona)

13 posts

Luke is a Senior Software Engineer, an expert in .NET and Vertical Slice Architecture. With a passion for sharing knowledge, he loves educating the developer community with thought-provoking blogs.