Why Lore A different layer of the stack

What makes Lore different

Every AI coding tool now has "memory." But that word means very different things. Some store facts. Lore builds compounding knowledge — a local-first, fair source engine that treats context management and long-term memory as one pipeline, for whatever agent you already use. Facts accumulate. Knowledge compounds. Only one scales.

Get Started View Repository

Building with a team? Folk Lore — shared team memory, coming soon →

The short version

Three reasons to choose Lore

Not a knock on anyone else — just what Lore optimizes for that others, by design, don't.

Memory and context — one pipeline

Most memory tools store and retrieve, but leave your context window to the agent's own compaction — so it still loses track mid-session. Most context tools compress history but learn nothing from it. In Lore, compression is the memory pipeline: distillation feeds the gradient context manager, which feeds the knowledge curator, which feeds your project's knowledge base. One system, not two bolted together.

Local-first, fair source

The whole engine runs on your machine. Messages, distillations, knowledge, and on-device embeddings live in a local SQLite database — zero added infrastructure, nothing to send to a server to make memory work. The engine is fair source, so you can read exactly how your memory is formed and stored. No black box.

Your agent, no platform to adopt

Lore is a proxy on the API wire, so it works with the agent you already use — Claude Code, OpenCode, Pi, Codex, or anything speaking the Anthropic or OpenAI protocol — just by pointing its base URL at Lore. You don't move your workflow into a new terminal, IDE, or cloud control plane to get memory.

The landscape

How the approaches compare

A fair map of the common approaches to agent memory — and where Lore sits.

Approach	What it does well	The trade-off	Where Lore differsLore's take
Memory-only tools store & retrieve past conversations	Good at recalling things you explicitly saved or that were captured earlier.	Don't manage the live context window — the agent can still get compacted mid-session and lose track of what it's doing right now.	Lore manages the context window and remembers — so memory arrives in-context at the right moment, not just when you ask.
Context-only tools compress history to extend a session	Keep a single long session alive without hitting the window limit.	Nothing is learned from the compression. New session, new project, or new tool — and you're back to zero.	Lore extracts durable knowledge from the compression, so every session makes the next one smarter.
Tool- or IDE-bound memory memory built into one agent or editor	Deep, polished integration inside that one tool.	Switch tools, providers, or machines and the memory stays behind. Your knowledge is tied to a product choice.	Lore sits on the API wire, so memory follows you across tools and providers — your constant, not theirs.
Self-hosted memory stacks vector DB + workers as separate services	Powerful hybrid retrieval, fully local, no cloud.	A cluster to stand up and maintain — a vector database, a queue or worker, extra runtimes — and often bound to a single agent.	Lore does the same hybrid recall (vector + keyword + RRF) embedded in one process — no services to run, and it works with any agent.
Cloud agent platforms memory inside a hosted control plane	Org-wide retrieval and governance for fleets of agents run on the platform.	Memory exists for agents run through the platform; the engine is hosted and closed. You adopt the platform to get the memory.	Lore runs locally with the agent you already use. The engine is fair source; your knowledge is a plain file in your repo you can read and use without us.

These are categories, not a scorecard against any one product. Many tools blend approaches, and the good ones are genuinely good at what they target. Lore's bet is that unifying the halves — locally, across any harness — is the combination most coding workflows actually want.

The depth question

Not all memory is created equal

The spectrum runs from simple fact storage to full memory architecture. Where your tool sits determines what breaks first.

Fact Storage

Store key-value facts
Inject into prompt
TTL-based expiry
No search or recall
Locked to one tool

Memory Architecture

Three tiers: temporal, distillation, knowledge
Cost-aware gradient context management
Archive, not delete
Active recall across multiple search sources
Relevance-scored injection per turn

The context cliff

At 200K tokens, fact-injection memory is still dumping everything into the prompt — competing with your actual conversation for space. A memory architecture manages compression layers: full fidelity for recent work, intelligent distillation for older context, knowledge extraction for durable insights. Lore's gradient context manager tracks real API token counts and model pricing to decide what to surface when — not "inject everything and hope."

The return moment

You haven't touched the billing service in 6 weeks. Fact-based memory has expired your context — 28 days unused, gone. Architectural memory has compressed 47 sessions into a distilled summary, preserved your "NEVER change the payment retry logic" directive, and kept the architectural decisions searchable. You pick up exactly where you left off.

The multi-tool reality

You use Claude Code for complex refactors and another tool for quick fixes. Fact-based memory only works inside one of them — switch tools and start over. Lore sits on the API wire: one local database, shared across every tool and every provider. Your memory is yours, not theirs.

The engine

Active context management, not passive injection

Most memory systems dump facts into the system prompt. Lore manages the full context window — cost-aware, cache-optimized, calibrated in real time.

Cost-aware

Compression only when it pays for itself

Lore compares the cost of busting the prompt cache against the cost of continuing at current size — using real model pricing. It only compresses when the economics justify it. Background work runs on cheaper models via batch APIs at up to 50% off.

Gradient

Progressive compression, not a cliff

A 4-layer system from full passthrough to emergency compression, calibrated dynamically from real API token counts. Each layer preserves the prompt cache as long as possible — no more "compaction wiped everything" moments. The longer you work, the bigger the advantage: +70% average recall over compaction at 2.3M tokens.

Relevance-scored

The right knowledge at the right turn

Lore doesn't dump everything into the system prompt. On the first turn, only unconditional preferences are injected. After the first response, Lore reads the actual conversation and scores every knowledge entry against it — using vector similarity (local Nomic embeddings) or BM25 keyword matching — then injects only what's relevant. The injection is diff-pinned: it only updates when content changes materially, preserving the prompt cache.

Active recall

The agent searches its own memory

Passive memory waits to be injected. Lore gives your agent a recall tool that searches knowledge entries, distillation observations, raw conversation messages, and cross-project knowledge in parallel — fused with reciprocal rank scoring and LLM-powered query expansion. Your agent reaches for context when it needs it.

The cost story

Memory that pays for itself

A memory layer that makes extra LLM calls can quietly cost more than it saves. Lore is engineered the other way — every optimization is gated on real model pricing.

No compaction, no cache bust

This is the big one. Every compaction destroys the entire prompt cache, so the next turn re-writes tens of thousands of tokens at full price. Lore's gradient compression never triggers a compaction — your prompt cache stays intact, turn after turn. Lore also caches in layers, the way a Dockerfile does: order the stable parts first and a change later never rebuilds them. Stable preferences sit in a long-lived 1-hour cache block at 10× cheaper reads; the live conversation rides the short-lived cache that follows — so editing the conversation never invalidates your preferences, just as touching your source doesn't reinstall your dependencies. The quality win and the cost win are the same mechanism.

Warm across long tool calls — only when it pays

Long-running tool calls and idle gaps are where prompt caches expire, and a cold cache means an expensive rebuild on the next turn. Lore runs survival analysis on your inter-turn timing to keep the cache warm across those gaps with near-zero keepalive requests — and only when the expected savings beat the cost. A circuit breaker disables warming entirely if the math ever stops working.

Cheaper by default, and capped

Background distillation and curation run at half price through batch APIs when available, on cheaper worker models A/B-tested for parity. You can set a daily spend cap — background work throttles as you approach it. And Lore reuses the plan you already have, including Claude Pro/Max, so memory doesn't show up as a surprise line item.

Ownership

Your data, your machine

No cloud dependency. No vendor lock-in. A local database you own and a fair source engine you can read.

Ownership

Your data lives on your machine

Messages, distillations, knowledge, and on-device embeddings live in a local SQLite database you own — not a vendor's cloud. Nothing leaves your machine unless you choose to share it. Stop using Lore tomorrow and your data is still there, in an open format. That's portability you can verify, not a promise to export.

No lock-in

Where the commitment lives matters

Every memory system asks for some commitment. With a hosted platform, it's at the platform layer — route your agents through it or get nothing. With Lore, it's a one-line base-URL change you can undo at any time. Switch providers, switch tools, switch laptops — one local database travels with you.

Proven where it's hardest

The advantage grows with the session

On a real 5-day, 2.3M-token coding session, Lore averaged 4.0/5 on recall versus 2.4/5 for compaction (+70%), with 2.6x more perfect answers (13/20 vs 5/20). The early-session details that lossy summaries destroy are exactly what Lore's three-tier memory preserves. The longer you work, the bigger the gap.

 # One line. Any agent.
$ lore run

# Local engine. Your data stays
# on your machine. 

What "portable" really means. Some tools describe their memory as "portable" — but that usually means you can host the storage yourself, not that you can use the data without their engine. With Lore, the engine is fair source and the database is local SQLite — an open format any tool can read. Curated knowledge is also exported to .lore.md in your repo as human-readable markdown, but the real value is the local engine and the database you own.

Teams

Team memory, without giving up the local-first model

Individual memory works today. Team memory is coming with Folk Lore — built on the same local-first engine.

Local-first, team-ready

Every developer's Lore instance builds project knowledge locally. Folk Lore will connect these instances — shared conventions, gotchas, and decisions traveling across the team without anyone giving up local ownership of their data.

Knowledge promotion and access controls

Not all knowledge should be shared equally. Folk Lore will add promotion workflows, team-scoped visibility, and approval gates — so your team controls what becomes shared memory, who can see it, and how it evolves.

Integrates with the tools you already use

Folk Lore will connect to your existing team structure — GitHub Teams, repo collaborators, org permissions. No parallel management system to maintain. Your team's AI memory governed by the same people and processes that govern your code.

Folk Lore · early access for teams

Your team's memory, together.

Lore is free and local today — install it now with curl -fsSL https://withlore.ai/install | bash. Folk Lore brings your team's memory together — shared, searchable, always current. Early access is rolling out. Be part of the myth.

Models will commoditize. The context harness will differentiate. Lore is that harness — the context engine every agent needs to build the future.

No spam. Unsubscribe anytime.

✓ You're on the list. We'll be in touch.

Something went wrong. Please try again.