Foundry: Persistent Context Architecture for AI-Assisted Development
Extending GStack’s review and product discovery methodologies with persistent context, spec capture, and implementation roadmapping
The models are getting better fast, but the failure mode I keep hitting has nothing to do with code quality. It’s context loss. Agents forget what you told them, specs get too long to retain properly, decisions made in session 3 are invisible by session 12, and reviews run at the wrong time or not at all. Every new project, I found myself rebuilding the same process scaffolding from scratch.
So I built Foundry, an orchestration framework that handles the full lifecycle from product discovery to shipped code. One command, /foundry-start, triggers the whole pipeline. The product discovery and review methodologies are adapted from Garry Tan’s GStack, while the context architecture, spec capture system, and implementation roadmapping are new.
When context gets too bloated, models degrade on both memory and performance. So after the structured interview that kicks off every project, Foundry sections and chunks the transcript into highly specific files, each covering one topic. Those files link directly to the implementation roadmap. When the agent approaches a task, it has explicit instruction to reference the specific interview section file instead of scanning one massive spec. That way it has context on what the task is for, what it connects to, and the intention behind it. Preserves 100% of the details.
What Foundry does
/foundry-start runs five phases:
-
Product discovery: before any spec writing, it challenges your assumptions. Are you solving the right problem? What if you did nothing? It generates 2-3 alternative approaches with effort estimates, you pick one, and that becomes your seed design doc.
-
Deep interview: the agent interviews you about every detail of the system, from architecture and edge cases to thresholds, failure modes, and user flows. Domain-expert personas generated from the seed design doc give you recommendations from CEO, engineering, and design perspectives. You confirm, modify, or override. Every decision gets recorded, and the interview becomes the full specification that the rest of the build traces back to.
-
Context extraction: the interview splits into focused section files, each targeting ~2,500 tokens and covering one topic, based on header structure. Each build phase reads only the sections relevant to its work. Sections never split mid-topic to hit the target. This matters more than most people think: even Claude Opus 4.6, the strongest long-context model available today, scores 76% on Anthropic’s 8-needle MRCR v2 benchmark at 1M tokens. That means roughly 1 in 4 buried details gets lost. For systems where nuanced constraints, edge cases, or architectural decisions can’t be missed, 76% isn’t good enough. Focused, task-scoped sections close that gap.
-
Implementation roadmap: each deliverable is wired to specific section files with line ranges, interview references for decision rationale, persona assignments, review routing, interface contracts (what this phase exposes to downstream phases), and a complexity budget. Exceed the budget and the agent stops to reassess instead of sprawling.
-
Build loop: the agent executes the roadmap phase by phase. Each phase: answer 3 checkpoint questions from the spec to prove comprehension before writing code, implement, verify, run QA with a test-fix-verify loop that generates regression tests, ship, update docs, then write a retrospective. The retro feeds into the next phase so that learning compounds across the build.
The whole thing is ~3,800 lines of methodology across 22 files and works with Cursor, Windsurf, Claude Code, or any LLM that can read markdown.
How I got here
I built the orchestration framework first. Over several weeks, across multiple projects, I kept refining the same process: structured interviews, persona generation, section-based context management, review gates, and verification loops. Each iteration got tighter, and the thing I kept learning was that spec capture and context architecture mattered more than code generation. That’s where agents actually failed.
Then I studied GStack. It’s a full sprint-based process with strong individual review skills and the ability to run 10-15 parallel sprints. I took every methodology that applied and integrated them into my framework. The product discovery and review skills are adapted from GStack, while the orchestration, context system, and implementation architecture around them are mine.
Foundry vs GStack
GStack optimizes the sprint cycle: you have a feature, and it runs Think, Plan, Build, Review, Test, Ship in parallel across 10-15 agents using Conductor. That parallel throughput is GStack’s biggest strength and something Foundry doesn’t have yet.
Foundry optimizes the full lifecycle, including everything before the sprint starts. The pre-code pipeline (discovery, interview, context extraction, roadmapping), the persistent context architecture that routes scoped sections to each build phase, interface contracts between phases, complexity budgets, and structured retrospectives that propagate wrong assumptions back through the system so later phases learn from earlier ones. None of that exists in GStack’s model.
The tradeoff today is clear: GStack gives you speed through parallelism, Foundry gives you precision through context management. The ecosystem is heading toward both, and I think that convergence is where the most interesting work will happen.
What’s next
I’m exploring multi-agent parallelism. I currently use Antigravity IDE, which requires manually spinning up new agents, but Claude Code can spawn them autonomously, which is closer to what GStack does with Conductor. @antigravity team - let’s go! Give us autonomous multi-agent parallelism and subagents!
In closing, code is commodity now. AI writes correct code, and that’s table stakes. What differentiates the output is taste: the nuanced reasoning you captured during the interview, the specific constraints you defined, the stream-of-thought decisions that reflect how you think about the problem. That feel only shows up in the code if the agent actually references it, reads it, verifies it understood it, and builds from it. Foundry’s context architecture is built to make sure that happens. Every coding decision the agent makes ties back to highly specific, verified context, not a best-effort scan of a massive spec. The result is stronger confidence that the code isn’t just logically correct, but built to the exact intent behind it.
GitHub: github.com/laloquidity/foundry
Product discovery and review methodologies adapted from GStack by Garry Tan.
Changelog
[0.2.0] - 2026-03-22 — CSO Security Audit Integration
Added a Chief Security Officer audit adapted from GStack: OWASP Top 10, STRIDE threat modeling, attack surface mapping, and zero-noise false positive filtering (17 hard exclusions, 9 precedents, 8/10 confidence gate). Each finding requires a concrete exploit scenario and independent verification.
The CSO is wired into the build loop as Step 3.5 (after QA, before ship). If findings are CRITICAL or HIGH, the agent enters a fix-verify-CSO cycle before shipping. The final roadmap phase triggers a full cumulative audit to catch cross-phase security interactions. Phases introducing new dependencies automatically trigger supply chain scanning.
The build loop is now: implement → verify → QA → CSO audit → ship → docs → retro.