Have the AI produce a plan that spans multiple files (like "01 create frontend.md", "02 create backend.md", "03 test frontend and backend running together.md"), and then create a fresh context for each step if it looks like re-using the same context is leading it to confusion.
Also, commit frequently, and if the AI constantly goes down the wrong path ("I can't create X so I'll stub it out with Y, we'll fix it later"), you can update the original plan with wording to tell it not to take that path ("Do not ever stub out X, we must make X work"), and then start a fresh session with an older and simpler version of the code and see if that fresh context ends up down a better path.
You can also run multiple attempts in parallel if you use tooling that supports that (containers + git worktrees is one way)
Inventivatbly the files become a mess of their own. Changes and learnings from one part of the plan often dont result in adaptation to impacted plans down chain.
In the end you have a mish mash of half implemented plans and now you’ve lost context too. Which leads to blowing tokens on trying to figure out what’s been implemented, what’s half baked, and what was completely ignored.
Any links to anyone who’s built something at scale using this method? It always sounds good on paper.
Don’t rely entirely on CC. Once a milestone has been reached, copy the full patch to clipboard and the technical spec covering this. Provide the original files, the patch and the spec to Gemini and say ~a colleague did the work and does it fulfill the aims to best practices and spec?
Pick among the best feedback to polish the work done by CC—-it will miss things that Gemini will catch.
Then do it again. Sometimes CC just won’t follow feedback well and you gotta make the changes yourself.
If you do this you’ll be more gradual but by nature of the pattern look at the changes more closely.
You’ll be able to realign CC with the spec afterward with a fresh context and the existing commits showing the way.
Fwiw, this kind of technique can be done entirely without CC and can lead to excellent results faster, as Gemini can look at the full picture all at once, vs having to force cc to consume vs hen and peck slices of files.
In my experience it works better if you create one plan at a time. Create a prompt, make claude implement it and then you make sure it is working as expected. Only then you ask for something new.
I've created an agent to help me create the prompts, it goes something like this: "You are an Expert Software Architect specializing in creating comprehensive, well-researched feature implementation prompts. Your sole purpose is to analyze existing codebases and documentation to craft detailed prompts for new features. You always think deeply before giving an answer...."
My workflow is: 1) use this agent to create a prompt for my feature; 2) ask claude to create a plan for the just created prompt; 3) ask claude to implement said plan if it looks good.
My system is to create detailed feature files up to a few hundred lines in size that are immutable, and then have a status.md file (preferably kept to about 50 lines) that links to a current feature that is used as a way to keep track of the progress on that feature.
Additionally I have a Claude Code command with instructions referencing the status.md, how to select the next task, how to compact status.md, etc.
Every time I'm done with a unit of work from that feature - always triggered w/ ultrathink - I'll put up a PR and go through the motions of extra refactors/testing. For more complex PRs that require many extra commits to get prod ready I just let the sessions auto-compact.
After merging I'll clear the context and call the CC command to progress to the next unit of work.
This allows me to put up to around 4-5 meaningful PRs per feature if it's reasonably complex while keeping the context relatively tight. The current project I'm focused on is just over 16k LOC in swift (25k total w/ tests) and it seems to work pretty well - it rarely gets off track, does unnecessary refactors, destroys working features, etc.
When I initially have it built from a feature file, it pulls in the most pertinent high level details from that and creates a supercharged task list that is updated w/ implementation details as the feature progresses.
As it links to the feature file as well, that is pulled into the context, but status.md is there to essentially act as a 'cursor' to where it is in the implementation and provide extended working memory - that Claude itself manages - specific to that feature. With that you can work on bite sized chunks of the feature each with a clean context. When the feature is complete it is trashed.
I've seen others try to achieve similar things by making CLAUDE.md or the feature file mutable but that IME is a bad time. CLAUDE.md should be lean with the details to work on the project, and the feature file can easily be corrupted in an unintended way allowing things to go wayward in scope.
Changing the prompt and rerunning is something where Cursor still has a clear edge over Claude Code. It's such a powerful technique for keeping the context small because it keeps the context clear of back-and-forths and dead ends. I wish it was more universally supported
I use the main Claude code thread (I don’t know what to call it) for planning and then explicitly tell Claude to delegate certain standalone tasks out to subagents. The subagents don’t consume the main threads context window. Even just delegating testing, debugging, and building will save a ton context.