PERSISTENT MEMORY FOR AI AGENTS

Your AI Remembers
Everything That Matters

Contexara gives AI agents a three-tier memory engine — hot session turns, long-term semantic memories, and crystallized episode summaries — so nothing gets lost between conversations.

Works via MCP · Python SDK · REST API · Runs where your agent runs

MEMORY STORE · namespace: assistant

profile

User is a senior backend engineer at a Series B startup

imp=5 · 2h ago

tech_pref

Prefers SQLite for local projects, avoids Postgres overhead

imp=4 · 3h ago

correction

Don't summarize output — user reads the diff directly

imp=5 · 5h ago

task

Building a memory layer for AI agents — Contexara v2

imp=3 · ttl=30d · 1d ago

episode

Last session: built hybrid FTS5+vector search, moved DBs to ~/.contexara/

crystallized · 1d ago

5 memories · 2 sessions ● hybrid search active

Three Tiers. Nothing Dropped.

Most memory systems pick one approach. Contexara runs three in parallel so your agent is never blind — whether it's mid-conversation or resuming a week-old session.

L0

Hot Session Turns

Every user/assistant exchange is stored verbatim in a hot SQLite DB with FTS5 full-text search. The last N turns are injected at the start of every conversation — your agent picks up exactly where it left off.

FTS5 search verbatim recall

L2

Semantic Memory Store

After every turn an LLM extraction pass distills durable facts — preferences, corrections, tech choices, constraints — into typed, versioned memories. Hybrid FTS5 + cosine vector search returns the most relevant facts for every query.

9 memory kinds never-delete versioning hybrid search

L1

Episode Summaries

When a session ends, a three-pass LLM crystallizer compresses the full transcript into a structured episode — title, actions, outcomes, open items, and a zero-fact-loss facts list. The next session opens with this summary pre-injected.

fact audit pass auto-injected cold archive

How Every Turn Works

Contexara runs a deterministic lifecycle on every conversation turn — no agent decisions required.

1

Session Start

Last N raw turns and the previous episode summary are injected — agent is never blind.

2

Memory Retrieval

Hybrid FTS5 + cosine search surfaces the most relevant long-term memories before the agent responds.

3

Turn Ingestion

The full (user, assistant) pair is saved to L0. An LLM extraction pass distills new durable facts into L2.

4

Crystallization

After 60 min idle or a manual checkpoint, the session is crystallized into a structured episode with zero fact-loss verification.

Two Ways to Integrate

Use the Python SDK for deterministic production agents, or drop the MCP server into any framework that supports tools.

Python SDK pip install contexara

from contexara import ContextaraClient

# One client per agent / namespace
mem = ContextaraClient(namespace="my-agent")

# Session start — inject history + last episode
ctx = mem.context()

# Search memory before answering
facts = mem.retrieve("user preferences", top_k=5)

# After every response — always
mem.ingest(user_text, assistant_text)

# Explicit memory store
mem.store("User prefers concise bullet responses",
         kind="style", importance=4)

# Crystallize session on demand
mem.checkpoint()

MCP Server stdio · any framework

# claude_desktop_config.json
{
  "mcpServers": {
    "contexara": {
      "command": "python",
      "args": ["-m", "contexara.mcp_server"]
    }
  }
}

# Tools exposed to your agent:
chat_context     # get last N turns + episode
chat_ingest      # store turn after every reply
memory_search    # hybrid search L2 store
memory_store     # explicit fact storage
session_checkpoint # crystallize now
episode_search   # search past sessions
last_episode     # most recent episode anchor

Structured, Not Flat

Contexara classifies every extracted memory into one of nine typed kinds so search, retrieval, and importance scoring are always precise.

profile imp 5

Name, role, company, background — who this person is.

correction imp 5

User corrected the agent's output, approach, or assumption.

preference imp 4

Stated likes and dislikes, workflows the user favours.

constraint imp 4

Hard limits — budget, team size, rules they operate under.

tech_preference imp 4

Specific tech: languages, libraries, tools preferred or avoided.

style imp 3

Communication style: format, detail level, tone preferences.

pattern imp 3

Recurring behaviour — how the user works, iterates, decides.

task imp 3 · ttl 30d

Current active work item. Expires in 30 days automatically.

note imp 1

Anything worth keeping that doesn't fit a specific kind.

Your AI Remembers Everything That Matters