[AI ARCH] JANUARY 19, 2026

Graph Density: Smarter Model Selection for AI Agents

BY LALO · 4 MIN READ · 753 WORDS

Every major consumer AI product now routes queries to different models depending on query. You ask something simple, you get the fast model. You ask something complex, you get the smart one. Task-type routing is solved.

So what’s left?

Here’s what I’ve been thinking about: the routing decision itself is stateless. It looks at what you asked. It doesn’t look at what you’ve discussed before, what’s connected to this topic in your history, or how densely linked this concept is in your personal knowledge graph.

Context is King.

The Problem with Stateless Routing

Imagine you mention “the hiring decision” to an AI assistant. A stateless router sees four words and classifies it as a simple query. Fast model.

But what if “hiring” connects to seventeen other conversations you’ve had over the past month? What if it links to your budget constraints, your team dynamics, your previous bad hire, your advisor’s warning, and your competing priority to ship the product faster?

This looks like a simple query. Four words. A stateless router would send it to the fast model. But in the user’s context graph, “hiring” connects to seventeen other conversations from the past month.

Stateless routing can’t see this. It routes on surface features: query length, detected intent, keyword patterns. It doesn’t know that “hiring” is the most densely-connected node in your context graph.

Semantic Graph Density as a Routing Signal

The density of connections in a user’s memory graph is a better routing signal than query complexity alone.

A query that touches an isolated topic, like something the user mentioned once and never again, can be handled quickly. There’s no context to synthesize.

A query that touches a deeply connected topic such as something that links to dozens of prior conversations, decisions, and unresolved tensions - well, that requires the most capable model you have. The synthesis work is invisible in the query, but visible in the graph.

Rather than attempting to brute force context into each LLM conversation, I’ve dug deep into what I’m calling graph-informed routing.

Traditional routing: query → task classification → model

Graph-informed routing: query → graph lookup → connection density score → model

The graph lookup asks: How many edges does this topic have? How recent are those connections? How unresolved are the linked threads?

High density + high recency + unresolved threads = send to the Brain (Thinking models).

Low density + isolated mention = send to the Hands (Fast execution models).

Why This Matters for Personal AI

I use AI for my productivity workflows. It makes me feel superhuman at times. Many founder friends ask about my system, and I describe a brute-force method of context graphs, markdown files, custom workflows, retrieval tasks, and more.

These conversations make me realize that the friction is that the memory of the LLM they use is flat. A list of facts, not a web of connections. Open ChatGPT’s memory settings and you’ll see what I mean. Individual facts. No edges. No structure.

For systems that build structured memory over time that model not just what you’ve said, but how your thoughts connect, that graph becomes the most valuable routing input you have.

The AI isn’t just responding to what you said. It’s responding to what you’ve been thinking about.

The Research Questions

These are the questions I’m exploring. If you’re building cross-context graph-based AI, DM me.

What density threshold triggers model escalation? Is it linear? Logarithmic? Does it depend on topic type?
How do you weigh recency vs. connection count? A topic mentioned thirty times six months ago might matter more than something mentioned three times this week.
Can you pre-classify topics by their synthesis potential? Some concepts (hiring, fundraising, health) are inherently high-stakes and high-connection. Others (one-off tasks) are inherently isolated.
What’s the latency cost of graph lookup? If querying the graph adds time before routing, is it worth the benefit? Does this change as the graph grows larger? Can you cache density scores per topic?

I don’t have all the answers yet. But I think the framing is right: routing should be memory-informed, not just query-informed.

The Takeaway

For systems that build memory over time, the graph is the signal. A simple query about a loaded topic isn’t simple at all. The graph knows. The router should too. Let’s think more deeply about how to make cross-context graph informed routing an invisible part of our daily workflows.

Context is King.