Every AI coding tool now has "memory." But that word means very different things. Some store facts. Lore builds compounding knowledge — a local-first, fair source engine that treats context management and long-term memory as one pipeline, for whatever agent you already use. Facts accumulate. Knowledge compounds. Only one scales.
Building with a team? Folk Lore — shared team memory, coming soon →
The short version
Not a knock on anyone else — just what Lore optimizes for that others, by design, don't.
Most memory tools store and retrieve, but leave your context window to the agent's own compaction — so it still loses track mid-session. Most context tools compress history but learn nothing from it. In Lore, compression is the memory pipeline: distillation feeds the gradient context manager, which feeds the knowledge curator, which feeds your project's knowledge base. One system, not two bolted together.
The whole engine runs on your machine. Messages, distillations, knowledge, and on-device embeddings live in a local SQLite database — zero added infrastructure, nothing to send to a server to make memory work. The engine is fair source, so you can read exactly how your memory is formed and stored. No black box.
Lore is a proxy on the API wire, so it works with the agent you already use — Claude Code, OpenCode, Pi, Codex, or anything speaking the Anthropic or OpenAI protocol — just by pointing its base URL at Lore. You don't move your workflow into a new terminal, IDE, or cloud control plane to get memory.
The landscape
A fair map of the common approaches to agent memory — and where Lore sits.
| Approach | What it does well | The trade-off | Where Lore differsLore's take |
|---|---|---|---|
| Memory-only tools store & retrieve past conversations | Good at recalling things you explicitly saved or that were captured earlier. | Don't manage the live context window — the agent can still get compacted mid-session and lose track of what it's doing right now. | Lore manages the context window and remembers — so memory arrives in-context at the right moment, not just when you ask. |
| Context-only tools compress history to extend a session | Keep a single long session alive without hitting the window limit. | Nothing is learned from the compression. New session, new project, or new tool — and you're back to zero. | Lore extracts durable knowledge from the compression, so every session makes the next one smarter. |
| Tool- or IDE-bound memory memory built into one agent or editor | Deep, polished integration inside that one tool. | Switch tools, providers, or machines and the memory stays behind. Your knowledge is tied to a product choice. | Lore sits on the API wire, so memory follows you across tools and providers — your constant, not theirs. |
| Self-hosted memory stacks vector DB + workers as separate services | Powerful hybrid retrieval, fully local, no cloud. | A cluster to stand up and maintain — a vector database, a queue or worker, extra runtimes — and often bound to a single agent. | Lore does the same hybrid recall (vector + keyword + RRF) embedded in one process — no services to run, and it works with any agent. |
| Cloud agent platforms memory inside a hosted control plane | Org-wide retrieval and governance for fleets of agents run on the platform. | Memory exists for agents run through the platform; the engine is hosted and closed. You adopt the platform to get the memory. | Lore runs locally with the agent you already use. The engine is fair source; your knowledge is a plain file in your repo you can read and use without us. |
These are categories, not a scorecard against any one product. Many tools blend approaches, and the good ones are genuinely good at what they target. Lore's bet is that unifying the halves — locally, across any harness — is the combination most coding workflows actually want.
The depth question
The spectrum runs from simple fact storage to full memory architecture. Where your tool sits determines what breaks first.
At 200K tokens, fact-injection memory is still dumping everything into the prompt — competing with your actual conversation for space. A memory architecture manages compression layers: full fidelity for recent work, intelligent distillation for older context, knowledge extraction for durable insights. Lore's gradient context manager tracks real API token counts and model pricing to decide what to surface when — not "inject everything and hope."
You haven't touched the billing service in 6 weeks. Fact-based memory has expired your context — 28 days unused, gone. Architectural memory has compressed 47 sessions into a distilled summary, preserved your "NEVER change the payment retry logic" directive, and kept the architectural decisions searchable. You pick up exactly where you left off.
You use Claude Code for complex refactors and another tool for quick fixes. Fact-based memory only works inside one of them — switch tools and start over. Lore sits on the API wire: one local database, shared across every tool and every provider. Your memory is yours, not theirs.
The engine
Most memory systems dump facts into the system prompt. Lore manages the full context window — cost-aware, cache-optimized, calibrated in real time.
Cost-aware
Lore compares the cost of busting the prompt cache against the cost of continuing at current size — using real model pricing. It only compresses when the economics justify it. Background work runs on cheaper models via batch APIs at up to 50% off.
Gradient
A 4-layer system from full passthrough to emergency compression, calibrated dynamically from real API token counts. Each layer preserves the prompt cache as long as possible — no more "compaction wiped everything" moments. The longer you work, the bigger the advantage: +70% average recall over compaction at 2.3M tokens.
Relevance-scored
Lore doesn't dump everything into the system prompt. On the first turn, only unconditional preferences are injected. After the first response, Lore reads the actual conversation and scores every knowledge entry against it — using vector similarity (local Nomic embeddings) or BM25 keyword matching — then injects only what's relevant. The injection is diff-pinned: it only updates when content changes materially, preserving the prompt cache.
Active recall
Passive memory waits to be injected. Lore gives your agent a recall
tool that searches knowledge entries, distillation observations, raw conversation messages,
and cross-project knowledge in parallel — fused with reciprocal rank scoring
and LLM-powered query expansion. Your agent reaches for context when it needs it.
The cost story
A memory layer that makes extra LLM calls can quietly cost more than it saves. Lore is engineered the other way — every optimization is gated on real model pricing.
This is the big one. Every compaction destroys the entire prompt cache, so the next turn re-writes tens of thousands of tokens at full price. Lore's gradient compression never triggers a compaction — your prompt cache stays intact, turn after turn. Lore also caches in layers, the way a Dockerfile does: order the stable parts first and a change later never rebuilds them. Stable preferences sit in a long-lived 1-hour cache block at 10× cheaper reads; the live conversation rides the short-lived cache that follows — so editing the conversation never invalidates your preferences, just as touching your source doesn't reinstall your dependencies. The quality win and the cost win are the same mechanism.
Long-running tool calls and idle gaps are where prompt caches expire, and a cold cache means an expensive rebuild on the next turn. Lore runs survival analysis on your inter-turn timing to keep the cache warm across those gaps with near-zero keepalive requests — and only when the expected savings beat the cost. A circuit breaker disables warming entirely if the math ever stops working.
Background distillation and curation run at half price through batch APIs when available, on cheaper worker models A/B-tested for parity. You can set a daily spend cap — background work throttles as you approach it. And Lore reuses the plan you already have, including Claude Pro/Max, so memory doesn't show up as a surprise line item.
Ownership
No cloud dependency. No vendor lock-in. A local database you own and a fair source engine you can read.
Ownership
Messages, distillations, knowledge, and on-device embeddings live in a local SQLite database you own — not a vendor's cloud. Nothing leaves your machine unless you choose to share it. Stop using Lore tomorrow and your data is still there, in an open format. That's portability you can verify, not a promise to export.
No lock-in
Every memory system asks for some commitment. With a hosted platform, it's at the platform layer — route your agents through it or get nothing. With Lore, it's a one-line base-URL change you can undo at any time. Switch providers, switch tools, switch laptops — one local database travels with you.
Proven where it's hardest
On a real 5-day, 2.3M-token coding session, Lore averaged 4.0/5 on recall versus 2.4/5 for compaction (+70%), with 2.6x more perfect answers (13/20 vs 5/20). The early-session details that lossy summaries destroy are exactly what Lore's three-tier memory preserves. The longer you work, the bigger the gap.
# One line. Any agent.
$ lore run
# Local engine. Your data stays
# on your machine. What "portable" really means.
Some tools describe their memory as "portable" — but that usually means you can host the storage yourself,
not that you can use the data without their engine. With Lore, the engine is fair source and the database
is local SQLite — an open format any tool can read. Curated knowledge is also exported to
.lore.md in your repo as human-readable markdown, but the real value is the
local engine and the database you own.
Teams
Individual memory works today. Team memory is coming with Folk Lore — built on the same local-first engine.
Every developer's Lore instance builds project knowledge locally. Folk Lore will connect these instances — shared conventions, gotchas, and decisions traveling across the team without anyone giving up local ownership of their data.
Not all knowledge should be shared equally. Folk Lore will add promotion workflows, team-scoped visibility, and approval gates — so your team controls what becomes shared memory, who can see it, and how it evolves.
Folk Lore will connect to your existing team structure — GitHub Teams, repo collaborators, org permissions. No parallel management system to maintain. Your team's AI memory governed by the same people and processes that govern your code.
Folk Lore · early access for teams
Lore is free and local today — install it now with curl -fsSL https://withlore.ai/install | bash.
Folk Lore brings your team's memory together — shared, searchable, always current.
Early access is rolling out. Be part of the myth.
Models will commoditize. The context harness will differentiate. Lore is that harness — the context engine every agent needs to build the future.
No spam. Unsubscribe anytime.
✓ You're on the list. We'll be in touch.
Something went wrong. Please try again.