Scaling agent context with knowledge graphs on Snowflake

Sachin Seth

May 6, 2026|14 min read

TL;DR: A guide to building stateful agent memory on Snowflake using Cortex features and relational primitives to model a knowledge graph. This provides agents with durable, trust-aware recall without adding a dedicated graph database.

The fifty-first conversation

Imagine you are a platform team running an internal AI assistant. An engineer asks the agent about the authentication service migration. The agent comes back with useful information. It can reason about OAuth2 flows, suggest migration patterns and outline a testing strategy. But, it doesn’t remember that three weeks ago a different engineer mentioned the migration depends on a deprecation timeline for an upstream service or that in conversation 23 there was a decision to use OAuth2 instead of SAML, or that the architect flagged a backwards compatibility risk in conversation 38.

The agent is not unintelligent. It is stateless. Every conversation starts from zero. And the obvious fix of stuffing previous conversations into the context window falls apart. A simple Q&A exchange might run a few thousand tokens, but a real working session where an engineer is debugging a service with tool calls, log retrieval, code generation and iterative refinement can easily consume hundreds of thousands of tokens in a single conversation. Fifty working sessions like that and you are looking at tens of millions of tokens of accumulated chat history.

Even if you could fit all that into a context window, you would not want to. Research on long-context LLM behavior shows that model performance degrades as input length increases, even for simple tasks. More context does not mean better answers. Pass a certain volume, it means worse answers at a higher cost.

So the problem becomes: How do you give an LLM agent accumulated knowledge that grows with every conversation, is queryable at the time of inference and only surfaces what is relevant?

Why transcript RAG fails

The first instinct is to embed your chat transcripts and use retrieval-augmented generation (RAG). It is a reasonable starting point, but it fails in three predictable ways.

Signal-to-noise collapse. Raw conversation transcripts are full of pleasantries, corrections, restated questions and tangents. The actual knowledge content (“Service X depends on Service Y,” or “we decided to use approach Z because of constraint W”) is a small fraction of the overall token volume. Vector search over raw transcripts retrieves passages, not facts. You get “the user asked about authentication” when what you need is “the authentication service migration is blocked by the upstream depreciation timeline, decided in sprint 31.”
No relationship awareness. Embedding-based retrieval finds content that is semantically similar to the query. But the critical context is often structurally related, not semantically similar. The fact that Service X depends on Service Y is relevant when someone asks about Service Y’s deprecation. But the embeddings for “Service X’s authentication flow” and “Service Y’s deprecation timeline” may not be all that close in vector space. The connection between them is a dependency relationship, not a semantic similarity, which is why you need a way to traverse relationships, not just a nearest-neighbor search.
No contradiction detection. Over 50 conversations, details or even direction of the project will change. An early conversation may say “we are using Kafka for event streaming” and then later on the conversation becomes “we switched to Kinesis.” If you throw the transcripts into a RAG, it can potentially surface both leaving the agent to reconcile the contradictory context without any additional information about which claim is current. In order to get around this, the context needs some kind of time-based versioning at least.

What we took away from this is that RAG is not useless, but RAG over raw transcripts retrieves text, not knowledge. RAGs help combat hallucinations, but they won’t prevent the agent from picking the wrong option when provided with it. To solve the accumulated context problem, you need to extract the knowledge from the transcripts into a structure which is capable of relationship-aware, temporally-versioned, confidence-scored retrieval.

That structure is the knowledge graph.

Memory as entities, claims and lifecycle

When people hear a knowledge graph, they tend to associate it with something overly academic: ontologies, RDF triples, endpoints, enterprise taxonomy projects that take a year to produce a schema that no one queries. That is not what we are building here.

The memory model here has two primitives: entities and claims.

Entities are the nouns. A person, a service, a technology, a decision, a team, a deadline. Each entity has a type, a set of properties and an embedding for semantic search. Entities are globally defined inside the workspace so the same AuthService entity appears whether it was mentioned in conversation 3 or conversation 47.

Claims are relationships between entities. They are typed, directional and carry the metadata. “AuthService DEPENDS_ON UpstreamService” is a claim. “Platform Team OWNS AuthService” is a claim. “OAuth2Decision DECIDED_BY Architect” is a claim where the decision date is captured in the claim’s metadata rather than baked into the relationship. Claims are the edges in the knowledge graph and they are where the real intelligence comes from.

The most critical design decision is that claims are not static facts. They have a lifecycle:

Proposed: The LLM extracted this from a conversation but it has not yet been validated.
Accepted: Confirmed through repeated mentions, explicit user validation or a high-confidence extraction from an authoritative statement.
Deprecated: Superseded by a new claim with a link to what replaced it.

Each claim also comes with a confidence score (which rates the certainty of the extraction), evidence links (which conversation and message does this claim originate from) and a time based validity (when was this claim true).

This architecture gives you something that transcript RAG fundamentally cannot: a structured, versioned, trust-scored representation of accumulated knowledge. The agent does not retrieve blocks of conversation and hope for the best; it uses claims with confidence scores and lifecycle states to pull the most relevant information. It also knows which claims are current, which are speculative and which have been superseded.

When the user goes to query, the system dynamically assembles context from this graph within a given token budget. It performs an initial semantic search, runs a graph traversal and fuses the results into a ranked context block.

Relational graph design in Snowflake

The schema

Three core tables:

entities: The node table. Each entity has an ID, name, entity_type (PERSON, SERVICE, TECHNOLOGY, DECISION, TEAM, etc.) and a properties column (JSONB) for flexible attributes. The entity_type enum gives you consistent typing without a rigid ontology. The JSONB column lets the extraction agent attach domain-specific metadata without schema migrations.
claims: The edge table. Each claim connects a subject_entity_id to an object_entity_id with a relationship_type (DEPENDS_ON, OWNS, DECIDED, BLOCKED_BY, IMPLEMENTS, etc.), a confidence score (float 0–1), a lifecycle_state (PROPOSED, ACCEPTED, DEPRECATED), temporal validity (valid_from, valid_until) and a superseded_by self-referential foreign key for deprecation chains. The workspace_id scopes the graph.
claim_evidence: The provenance table. Each row links a claim to the specific message, thread, extraction model, timestamp and raw text snippet that produced it. This is how you trace any claim back to its source conversation, critical for debugging extraction quality and for presenting evidence to users who want to verify a claim.

A note on schema design: It is tempting to denormalize evidence into a JSONB array on the claims table. Resist this urge. Practitioners will immediately notice the duplication and more importantly, a dedicated evidence table lets you query provenance independently. You can ask the agent to "Show me all claims extracted by model X" or "Show me all evidence from this thread" without scanning JSONB arrays.

Snowflake’s VECTOR data type sits natively on the entities table and on a companion claim_text column on claims, so we can compute similarity with the built-in vector functions:

CREATE TABLE entities (
  id         STRING PRIMARY KEY,
  type       STRING NOT NULL,
  name       STRING NOT NULL,
  aliases    ARRAY,
  properties OBJECT,
  name_vec   VECTOR(FLOAT, 768),   -- for semantic entity resolution
  created_at TIMESTAMP_NTZ,
  updated_at TIMESTAMP_NTZ
);
 
CREATE TABLE claims (
  id            STRING PRIMARY KEY,
  predicate     STRING NOT NULL,
  subject_id    STRING REFERENCES entities(id),
  object_id     STRING REFERENCES entities(id),
  claim_text    STRING NOT NULL,
  claim_vec     VECTOR(FLOAT, 768), -- for semantic claim retrieval
  confidence    FLOAT,
  state         STRING,             -- PROPOSED | ACCEPTED | DEPRECATED
  valid_from    TIMESTAMP_NTZ,
  valid_until   TIMESTAMP_NTZ,
  properties    OBJECT,
  created_at    TIMESTAMP_NTZ,
  updated_at    TIMESTAMP_NTZ
);
 
CREATE TABLE evidence (
  id              STRING PRIMARY KEY,
  claim_id        STRING REFERENCES claims(id),
  conversation_id STRING,
  message_id      STRING,
  model_name      STRING,
  model_version   STRING,
  raw_confidence  FLOAT,
  extracted_at    TIMESTAMP_NTZ
);

Vector embeddings are generated with AI_EMBED against snowflake-arctic-embed-m at write time. The VECTOR type has been GA since May 2024 and cosine similarity via VECTOR_COSINE_SIMILARITY is a straightforward SQL function.

Graph traversal as a recursive CTE

With the schema in place–entities as nodes, claims as edges, embeddings in a dedicated operation table–the two retrieval modes the system needs are:

Graph traversal to follow relationships between entities
Semantic search to find entities by embedding entities

Both run against Snowflake Postgres. Neither requires a second system.

Graph traversal is a recursive CTE in Postgres. A simplified example that retrieves accepted claims outward from a seed entity:

WITH RECURSIVE neighborhood AS (
  -- base: seed entity
  SELECT id AS entity_id, 0 AS depth
  FROM entities
  WHERE id = :seed_entity_id
 
  UNION ALL
 
  -- step: follow edges in either direction
  SELECT CASE WHEN c.subject_id = n.entity_id THEN c.object_id ELSE c.subject_id END,
         n.depth + 1
  FROM neighborhood n
  JOIN claims c
    ON (c.subject_id = n.entity_id OR c.object_id = n.entity_id)
   AND c.state != 'DEPRECATED'
  WHERE n.depth < 2
)
SELECT DISTINCT entity_id, MIN(depth) AS depth
FROM neighborhood
GROUP BY entity_id;

This example traverses forward (subject → object). In practice, you also need reverse traversal: A seed on UpstreamService should surface services that depend on it, not just services it depends on. Production code runs both directions and merges the results, or uses a bidirectional expansion that follows edges regardless of direction. The path array in the example provides cycle safety, preventing infinite loops in graphs with circular relationships.

This is the graph primitive. It is boring SQL. It runs in tens of milliseconds at workspace scale. And it carries all the metadata we need, including lifecycle state for trust filtering, confidence scores for ranking, temporal windows for time-aware queries, as ordinary columns rather than as bespoke graph properties.

Why relational is better for the write path

The read path argument gets the most attention, but the write path is where the relational model has an underappreciated advantage: ACID transactions. When the extraction agent processes a conversation segment, it needs to atomically insert new entities, create claims between them, write evidence records linking claims to source messages and potentially deprecate existing claims that are contradicted. In Postgres, this is a single transaction. If the contradiction check fails or the entity resolution produces a conflict, the entire write rolls back cleanly.

The extraction pipeline

The extraction pipeline turns raw conversation segments into entity and claim writes. In the old world, this meant a custom Python service calling an LLM with a hand-tuned prompt, parsing the JSON response, resolving entities against a database, checking for contradictions and writing results. This was all code data engineers had to maintain, with all the brittleness that implies.
Snowflake’s Cortex AI functions collapse most of that into SQL. AI_EXTRACT takes a natural language instruction and a response format and returns structured JSON directly from text. The function runs inside the warehouse, under the same governance as every other query, with no external service to call and no API keys to rotate.

A conversation segment goes in. A structured extraction comes out:

SELECT AI_EXTRACT(
  text => :conversation_segment,
  responseFormat => {
    'entities': [{
      'name': 'canonical name of the entity',
      'type': 'one of: service, person, project, team, decision, incident'
    }],
    'claims': [{
      'subject': 'entity name',
      'predicate': 'one of: depends_on, blocked_by, decided_in, part_of, owned_by, superseded_by',
      'object': 'entity name',
      'confidence': 'float between 0 and 1',
      'reasoning': 'why this claim was extracted'
    }]
  }
) AS extraction;

The response comes back as a JSON object that we can unnest directly into the claims and entities tables. What used to be a Python service with prompt templates, retry logic and response validation becomes a SQL statement against Snowflake-managed infrastructure.

Entity resolution happens in the next step of the pipeline. For each extracted entity name, we look for an existing row with a matching canonical name or alias, first by exact string match, then by semantic similarity using VECTOR_COSINE_SIMILARITY against the name_vec column. If a match exists above a confidence threshold, we reuse the existing entity ID. If not, we create a new row. This is ordinary SQL with a vector function, not a separate service.

Contradiction detection is itself a Cortex AI call. When a new claim lands with the same (subject, predicate, object) tuple as an existing claim, we use AI_CLASSIFY to decide whether the new evidence corroborates, contradicts or refines the existing claim.

Corroboration moves a PROPOSED claim to ACCEPTED.
Contradiction deprecates the older claim and sets valid_until to the new claim’s timestamp.
Refinement creates a new claim linked to the old one. The older claim stays in the table, visible to audit queries but disappears from retrieval because we filter on state != 'DEPRECATED'.

What ties this together into a real pipeline (instead of a SQL statement you run by hand) is Snowflake’s native change-data primitives. A Stream on the messages table tracks every new conversation segment as it lands. A Task fires on that stream, calls AI_EXTRACT on the new rows and writes the results into the entities, claims and evidence tables. For simpler workloads, a dynamic table with a TARGET_LAG of a few minutes can replace the Stream/Task pair entirely. You declare the extraction SQL as a dynamic table and Snowflake figures out the incremental refresh. Either way, the extraction pipeline has no orchestrator outside the warehouse. No Airflow DAG, no Lambda, no queue. The warehouse is the runtime.

The governance implications are worth calling. Because extraction runs inside Snowflake, the audit trail is the database’s own access log. Every extraction is attributable to a role, a warehouse and a timestamp. The data never leaves the governance perimeter.

Hybrid retrieval at inference time

Retrieval is where the architecture earns its keep. When the agent is assembling context for a user query, it runs three retrieval paths in parallel and fuses the results.

The semantic path is Cortex Search. We index the claims table into a Cortex Search service that maintains a hybrid vector-plus-keyword index with semantic reranking, managed by Snowflake and refreshed automatically as the underlying data changes. A user query goes in, a ranked list of claims comes out, scored against both semantic similarity and lexical match. Internal benchmarks from Snowflake show hybrid retrieval outperforming pure vector search by roughly 12% on question-and-answer workloads, enough to matter when the LLM’s answer quality is downstream of retrieval quality.

The graph path is the recursive CTE. Given the entities that Cortex Search surfaced as relevant to the query, walk two hops out in the claim graph and collect everything in the neighborhood. This is how we pick up the structurally relevant context that semantic search would miss. These are the downstream consumer constraint, the decision made in a Slack thread two conversations ago, the incident that’s related to the current entity through a typed relationship rather than a lexical overlap, etc.

The recency path is a time-windowed scan of claims where valid_from falls inside the last N days. This is a cheap query that captures the "what happened recently" context: the commit from last night, the decision from yesterday’s meeting, etc. This recent context might not yet have accumulated enough semantic signal or graph connectivity to surface through the other two paths.

Fusion is reciprocal rank fusion. Each path produces a ranked list; RRF combines them by summing the reciprocal of each result’s rank across lists. It’s a one-line SQL query:

WITH fused AS (
  SELECT claim_id, SUM(1.0 / (60 + rank)) AS score
  FROM (
    SELECT claim_id, rank FROM semantic_results
    UNION ALL
    SELECT claim_id, rank FROM graph_results
    UNION ALL
    SELECT claim_id, rank FROM recency_results
  )
  GROUP BY claim_id
)
SELECT c.*
FROM fused f
JOIN claims c ON c.id = f.claim_id
WHERE c.state != 'DEPRECATED'
ORDER BY f.score DESC
LIMIT :context_budget;

RRF is the right default for heterogeneous rankings because it normalizes away score-scale differences between the paths. A claim that’s in the top 10 of all three lists beats a claim that’s #1 on one list and missing from the other two, which is exactly the behavior we want. Deprecated claims are filtered out at the SQL level so they never reach the agent and :context_budget keeps the final set inside the LLM’s effective-context sweet spot.

One underrated property of this architecture is that everything is already in Snowflake. The entities, claims and evidence tables are tabular data in a warehouse. Which means the observability layer is the same warehouse, using the same SQL.

A few queries that fall out of this for free:

Extraction quality over time. Group evidence rows by model version and compute the rate of claims that end up DEPRECATED within 30 days of creation. A rising deprecation rate means the extraction model is producing more contradictions than before, which is a signal to roll back the model version or adjust the prompt.
Confidence calibration. Bucket claims by raw extraction confidence and compare to the fraction that ever reach the ACCEPTED state. A well-calibrated extractor shows monotonic improvement where higher-confidence claims get accepted more often. A flat or inverted curve means the confidence score isn’t carrying information and the downstream ranking should stop trusting it.
Graph density and coverage. Count entities per workspace, claims per entity and the distribution of predicate types. Sparse graphs retrieve poorly. Dense graphs with uneven predicate distributions suggest the extraction prompt is over-fitting to a subset of relationship types.
Retrieval attribution. When an agent produces a wrong answer, you can join back through the claim IDs it cited to the evidence rows and look at the original conversation segment. The trace goes: agent answer → claims used in context → evidence → source message.

For a regulated environment, these queries are not optional. They are how you demonstrate that the knowledge graph is behaving as intended, that the extraction model’s errors are bounded and no stale or unauthorized data is reaching the agent. Because it’s all SQL against a warehouse, the same dashboards your BI team already builds apply to your AI system.

The controls that make this safe in production are Snowflake Horizon primitives, not application code:

Row access policies enforce workspace isolation on the entities, claims and evidence tables. A policy that filters on the current role’s workspace ID means two tenants can share the same physical tables without ever seeing each other’s claims.
Data protection policies redact PII in claim text and entity properties based on the querying role, so an engineer viewing the knowledge graph for debugging sees redacted fields that the agent sees in the clear only when called with the right OAuth scope.
Object tags propagate sensitivity classifications from source messages through derived claims, so a claim extracted from a confidential conversation inherits the confidentiality marker automatically.
Access history gives a three-hour-latency audit view of every read and write across all three tables (including the AI function calls themselves) joined by role, warehouse and query ID.

None of this is custom infrastructure. It is the same governance layer Snowflake applies to the rest of the warehouse, applied here to the knowledge graph tables.

If we return to the opening scenario where an engineer asks about the auth service migration. The vector path retrieves AuthService and related entities by semantic similarity. The graph path traverses outward: UpstreamService (via DEPENDS_ON), the UpstreamDeprecation decision that blocks the migration (via BLOCKED_BY), the OAuth2Decision that applies to the service (via APPLIES_TO) and the architect who made it (via DECIDED_BY). All of this arrives as structured claims with confidence scores and evidence links, not as hundreds of thousands of tokens of raw chat history, but as a few hundred tokens of precise, trust-scored, relationship-aware context.

Agents as consumption layer

The knowledge graph is only half the system. Something has to consume it: turn a user question into retrieval calls, assemble context, invoke the LLM and return an answer. In a do-it-yourself architecture that means an orchestration service: a Python process that parses the query, calls Cortex Search, walks the graph, fuses results, builds a prompt, calls AI_COMPLETE and streams the response back. Another service to run, another surface to secure, another place for governance to diverge.

Cortex Agents collapse that layer into the warehouse. An agent is a declarative object that wires together a set of tools:

Cortex Search service over the claims table
Cortex Analyst semantic view over the entities table
Parameterized SQL queries for graph traversal
A system prompt that tells the model how to use them

The agent runtime handles tool selection, multi-step reasoning and response streaming. Inputs and outputs inherit the same entitlements and row access policies as the underlying tables. The audit trail is Access History.

This matters for the fifty-first conversation for a specific reason. An agent that has to gather context from a knowledge graph typically needs several tool calls in sequence. It needs to resolve the user’s question to an entity, pull that entity’s neighborhood, check for recent claims and then synthesize. In a custom orchestrator, each of those steps is code you write and debug. In Cortex Agents, each step is a tool the agent picks based on the query, with the sequencing handled by the managed runtime. For the memory-layer architecture in this article, the runtime is how the three retrieval paths get invoked—not as a handwritten fusion script, but as three tools the agent composes.

Snowflake Intelligence is the packaged end-user version: a mobile app and web interface that speaks to Cortex Agents, with integrations to Gmail, Slack, Jira and Salesforce so the agent’s actions can extend past the warehouse. For a platform team, the build-versus-buy decision is now: build a custom agent runtime on top of Cortex AI functions, deploy Cortex Agents as the runtime or stand up Snowflake Intelligence as a packaged product. The memory layer is the same in all three.

Conclusion

The architecture, end to end:

Conversations flow into a Snowflake-native extraction pipeline built on AI_EXTRACT, triggered by Streams and Tasks or declared as a dynamic table
Extracted entities and claims land in relational tables with VECTOR columns for semantic retrieval
AI_CLASSIFY handles lifecycle transitions between PROPOSED, ACCEPTED and DEPRECATED as new evidence arrives
A Cortex Search service indexes the claims table for hybrid vector-plus-keyword retrieval with semantic reranking
Graph traversal is a recursive CTE over the claims table
Retrieval fuses semantic, graph and recency paths with RRF and filters on lifecycle state
Cortex Agents consumes the memory layer as declarative tools rather than a custom orchestrator
Snowflake Horizon primitives (row access policies, dynamic masking, object tags, Access History) govern the whole stack using the same mechanisms as the rest of the warehouse

Three principles did the work.

First, the knowledge graph is a relational graph. We did not need a dedicated graph database, because our traversal depth, graph size and query patterns are a poor match for a relational database’s strengths and a good match for recursive CTEs over indexed tables. The cost of a second database includes operational complexity, governance duplication and cross-system consistency. These costs would likely be greater than the benefit of native pointer traversal at our scale.

Second, extraction is a platform capability, not a custom service. By pushing entity and claim extraction into AI_EXTRACT, we removed a Python service, a set of prompt templates and a class of bugs from our operational footprint. The extraction runs inside the warehouse, under the warehouse’s governance, with the warehouse’s audit log. For a regulated environment, this is load-bearing.

Third, retrieval is trust-aware. The lifecycle state on claims (PROPOSED, ACCEPTED, DEPRECATED) is the mechanism that prevents stale knowledge from reaching the agent. Vector search alone has no concept of "this used to be true but isn’t anymore." The knowledge graph model does and we make the ranking policy explicit at the SQL level so the behavior is auditable.

What’s next?

Next we explore how to extend this pattern into domains where the entity model gets richer. A customer knowledge graph would add ownership, access control scopes and consent lifecycle states. An incident knowledge graph would add temporal precedence edges and causal traceback. A compliance knowledge graph would add attestation claims and regulatory citations as first-class objects. The core pattern of entities, claims, evidence, hybrid retrieval and lifecycle-aware ranking and Snowflake’s Cortex AI functions plus VECTOR type and Cortex Search give us a single platform where the extraction, the storage, the retrieval and the governance all live together.

The fifty-first conversation is the one the agent should get right because of what it learned from the first fifty. That requires a platform where that data model can live without a sprawl of specialized infrastructure. The combination of a relational graph and a warehouse-native AI toolkit gets us there.

Sachin Seth, Technical Writer

Sachin Seth is a data platform architect and analytics product builder known for his deep work benchmarking Databricks & Snowflake compute and delivering high-performance data applications at scale. He develops full-stack analytics solutions—ranging from billion-point time-series engines to portfolio optimization apps and real-time financial dashboards—blending Databricks, Snowflake, Rust, Arrow and modern web technologies. He writes to bring clarity, measurement and engineering rigor to the rapidly evolving world of Databricks & Snowflake and modern data platforms.