Memory
Persistent state across sessions: short-term, long-term, and semantic memory.
Memory — Overview
The memory pattern enables an agent to persist information across conversations, building context over time. Short-term memory maintains state within a session; long-term memory stores and retrieves information across sessions using external storage.
Memory is the storage primitive — the durable tier of the broader Context Engineering frame. Read that foundation doc first if you're trying to decide what belongs in long-term memory vs. what should be summarized, pruned, or persisted to a scratchpad; this doc covers the storage mechanics once that decision is made.
Evolves from: Prompt Chaining — adds conversation state management, summarization, and persistent retrieval.
Architecture
Figure: The agent reads from both short-term (session) and long-term (persistent) memory. When short-term memory overflows the context window, a summarizer compresses it. Important information is stored in long-term memory for future sessions.
How It Works
- Receive input — A new user message arrives.
- Retrieve context — The agent queries long-term memory for relevant past information and reads recent short-term memory (conversation history).
- Reason and respond — The agent processes the input with the retrieved context and generates a response.
- Update short-term memory — The new exchange (input + response) is added to the session history.
- Store to long-term memory — Important information, decisions, or facts are extracted and stored persistently.
- Compress if needed — When the session history exceeds the context window, a summarizer compresses older messages into a summary.
Minimal Example
A coding assistant that remembers language preferences and project context across sessions.
from patterns.memory.code.python.memory_agent import MemoryAgent
agent = MemoryAgent(
llm=your_llm,
system="You are a personal coding assistant that adapts to each developer's preferences.",
)
# Session 1 — user provides context
agent.chat("I mostly work in TypeScript and I'm building a SaaS dashboard.")
agent.chat("I prefer React Query for data fetching and Zod for validation.")
print(agent.memory_snapshot)
# {'user_language': 'TypeScript', 'project_type': 'SaaS dashboard',
# 'prefers_react_query': 'true', 'prefers_zod': 'true'}
# Session 2 (new MemoryAgent instance, same LongTermStore) — memory is recalled
response = agent.chat("How should I handle form validation in my project?")
# Agent recalls TypeScript + Zod preference without being told again
# and tailors the response accordingly
Code variants
| Implementation | Language | Path |
|---|---|---|
| Framework-agnostic agent (MockLLM, working + long-term + semantic interfaces) | Python | `code/python/memory_agent.py` |
Vercel AI SDK (generateText answer + generateObject fact extraction) |
TypeScript | `code/typescript/vercel-ai-sdk/memory.ts` |
Both variants run the same two-turn smoke (user shares Python + agent project, asks a follow-up question, recalled facts shape the answer). The TypeScript variant uses the SDK's structured-output mode for the extract-and-store turn so the fact dictionary comes back typed.
Input / Output
- Input: User message + retrieved context from both memory types
- Output: Response informed by current and past interactions
- Short-term store: Recent conversation turns (message list)
- Long-term store: Persistent facts, preferences, decisions (vector store, database, or file)
Key Tradeoffs
| Strength | Limitation |
|---|---|
| Enables multi-session continuity | Storage and retrieval add complexity |
| Personalizes responses over time | Memory retrieval quality affects response quality |
| Handles conversations exceeding context window | Summarization can lose important details |
| Agents can learn from past interactions | Stale or incorrect memories can mislead the agent |
| More natural, human-like interaction | Memory management (what to store, what to forget) is hard |
When to Use
- Multi-turn conversations that span sessions
- Personal assistants that should remember user preferences
- Agents that need to learn from past interactions
- When conversation history exceeds the context window
- Tasks that build on previous work (iterative document editing, ongoing research)
When NOT to Use
- Single-turn interactions — no memory needed
- When all context fits in one prompt — don't add overhead
- When privacy requirements prevent storing conversation data
- Stateless processing tasks (classification, extraction)
Related Patterns
- Evolves from: Prompt Chaining — see evolution.md
- Combines with: ReAct (agent loop + memory), RAG (long-term memory can use the same vector store), Multi-Agent (shared memory between agents)
- Related to: RAG — RAG retrieves from a document store; Memory retrieves from interaction history. The retrieval mechanism is similar but the data source is different.
Deeper Dive
- Design — Memory types, storage strategies, retrieval patterns, summarization, forgetting policies
- Implementation — Pseudocode, context window management, vector store integration, testing
- Evolution — How memory evolves from prompt chaining
When NOT to use this pattern
- The interaction is single-turn — memory is a forgetting tax with no benefit.
- The user hasn't consented to long-term storage — privacy regression.
- You don't have a story for memory decay, invalidation, or auditing — stored memories fossilize and contradict reality.
Next steps
- Production version: see Blueprints → Deployments for the deployment agents that use this pattern.
- Generate a starter project: see Blueprint → Spec → Scaffold.
- Combine with other patterns: see the Composition guide.
Anti-compositions to watch for
Documented pairings where this pattern often fails in production. Source: composition/anti-compositions.
Memory + Multi-Agent without scoped writes
Memory's failure mode is poisoning. Multi-agent's failure mode is propagation. Composed without per-agent scopes, every worker can poison every other worker, and debugging requires reconstructing a multi-actor write history.
Instead:Give each agent its own memory scope (read-mostly cross-scope), or designate one memory-writer agent that owns all writes through an approval surface.
Memory + RAG over the same corpus and namespace
Memory is time-ordered, agent-owned context. RAG is topic-organized, externally-authored knowledge. Indexing them identically loses the type distinction and duplicates retrieval logic.
Instead:Same vector store infrastructure is fine; use separate namespaces with separate retrieval logic. Memory retrieval is recency-weighted, RAG retrieval is similarity-weighted.
Multi-Agent + Long-Term Memory without provenance tags
A poisoned worker poisons the long-term memory; all future agents inherit the poison. Without provenance there is no way to roll back selectively or down-weight a low-trust source.
Instead:Tag every memory entry with its source (`source: worker_X`). Apply per-source trust scoring and keep an audit log of writes.