Store, compress, and retrieve long-term memories with semantic lossless compression. Works across Claude, Cursor, LM Studio, and more.
Works with any AI platform that supports MCP or Python integration
|
Claude Desktop |
Cursor |
LM Studio |
Cherry Studio |
PyPI Package |
+ Any MCP Client |
🇨🇳 中文 •
🇯🇵 日本語 •
🇰🇷 한국어 •
🇪🇸 Español •
🇫🇷 Français •
🇩🇪 Deutsch •
🇧🇷 Português
🇷🇺 Русский •
🇸🇦 العربية •
🇮🇹 Italiano •
🇻🇳 Tiếng Việt •
🇹🇷 Türkçe
Overview • Quick Start • Docker • Cross-Session Memory • MCP Server • Evaluation • Citation
pip install simplemem. View Package Usage Guide →
SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.
SimpleMem is an efficient memory framework based on semantic lossless compression that addresses the fundamental challenge of efficient long-term memory for LLM agents. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through a three-stage pipeline:
|
Semantic Structured Compression Distills unstructured interactions into compact, multi-view indexed memory units |
Online Semantic Synthesis Intra-session process that instantly integrates related context into unified abstract representations to eliminate redundancy |
Intent-Aware Retrieval Planning Infers search intent to dynamically determine retrieval scope and construct precise context efficiently |
The SimpleMem Architecture: (1) Semantic Structured Compression filters low-utility dialogue and converts informative windows into compact, context-independent memory units. (2) Online Semantic Synthesis consolidates related fragments during writing, maintaining a compact and coherent memory topology. (3) Intent-Aware Retrieval Planning infers search intent to adapt retrieval scope and query forms, enabling parallel multi-view retrieval and token-efficient context construction.
Speed Comparison Demo
SimpleMem vs. Baseline: Real-time speed comparison demonstration
LoCoMo-10 Benchmark Results (GPT-4.1-mini)
| Model | ⏱️ Construction Time | 🔎 Retrieval Time | ⚡ Total Time | 🎯 Average F1 |
|---|---|---|---|---|
| A-Mem | 5140.5s | 796.7s | 5937.2s | 32.58% |
| LightMem | 97.8s | 577.1s | 675.9s | 24.63% |
| Mem0 | 1350.9s | 583.4s | 1934.3s | 34.20% |
| SimpleMem ⭐ | 92.6s | 388.3s | 480.9s | 43.24% |
💡 Key Advantages:
- 🏆 Highest F1 Score: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem)
- ⚡ Fastest Retrieval: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0)
- 🚀 Fastest End-to-End: 480.9s total processing time (12.5× faster than A-Mem)
SimpleMem applies an implicit semantic density gating mechanism integrated into the LLM generation process to filter redundant interaction content. The system reformulates raw dialogue streams into compact memory units — self-contained facts with resolved coreferences and absolute timestamps. Each unit is indexed through three complementary representations for flexible retrieval:
| 🔍 Layer | 📊 Type | 🎯 Purpose | 🛠️ Implementation |
|---|---|---|---|
| Semantic | Dense | Conceptual similarity | Vector embeddings (1024-d) |
| Lexical | Sparse | Exact term matching | BM25-style keyword index |
| Symbolic | Metadata | Structured filtering | Timestamps, entities, persons |
✨ Example Transformation:
- Input: "He'll meet Bob tomorrow at 2pm" [❌ relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00" [✅ absolute, atomic]
Unlike traditional systems that rely on asynchronous background maintenance, SimpleMem performs synthesis on-the-fly during the write phase. Related memory units are synthesized into higher-level abstract representations within the current session scope, allowing repetitive or structurally similar experiences to be denoised and compressed immediately.
✨ Example Synthesis:
- Fragment 1: "User wants coffee"
- Fragment 2: "User prefers oat milk"
- Fragment 3: "User likes it hot"
+ Consolidated: "User prefers hot coffee with oat milk"
This proactive synthesis ensures the memory topology remains compact and free of redundant fragmentation.
Instead of fixed-depth retrieval, SimpleMem leverages the reasoning capabilities of the LLM to generate a comprehensive retrieval plan. Given a query, the planning module infers latent search intent to dynamically determine retrieval scope and depth:
The system then executes parallel multi-view retrieval across semantic, lexical, and symbolic indexes, and merges results through ID-based deduplication:
|
🔹 Simple Queries
|
🔸 Complex Queries
|
📈 Result: 43.24% F1 score with 30× fewer tokens than full-context methods.
| System | LoCoMo Score | vs SimpleMem |
|---|---|---|
| SimpleMem | 48 | — |
| Claude-Mem | 29.3 | +64% |
| Task Type | SimpleMem F1 | Mem0 F1 | Improvement |
|---|---|---|---|
| MultiHop | 43.46% | 30.14% | +43.8% |
| Temporal | 58.62% | 48.91% | +19.9% |
| SingleHop | 51.12% | 41.3% | +23.8% |
| Metric | SimpleMem | Mem0 | Notes |
|---|---|---|---|
| Average F1 | 25.23% | 23.77% | Competitive with 99× smaller model |
OPENAI_BASE_URL in config.py.# 📥 Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem
# 📦 Install dependencies
pip install -r requirements.txt
# ⚙️ Configure API settings
cp config.py.example config.py
# Edit config.py with your API key and preferences
# config.py
OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None # or custom endpoint for Qwen/Azure
LLM_MODEL = "gpt-4.1-mini"
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # State-of-the-art retrieval
The MCP Server can be run in Docker for a consistent, isolated environment. Data (LanceDB and user DB) is persisted in a host volume.
# From the repository root
docker compose up -d
Data is stored in ./data on the host (created automatically).
cp .env.example .env
# Edit .env: set JWT_SECRET_KEY, ENCRYPTION_KEY, LLM_PROVIDER, model URLs, etc.
docker compose --env-file .env up -d
When LLM_PROVIDER=ollama and Ollama runs on your machine (not in Docker), set in .env:
LLM_PROVIDER=ollama OLLAMA_BASE_URL=http://host.docker.internal:11434/v1
On Linux, host.docker.internal is enabled automatically via the Compose file.
docker compose logs -f simplemem # Follow logs
docker compose down # Stop and remove containers
📖 For self-hosting the MCP server (Docker or bare metal), see MCP Documentation.
At a high level, SimpleMem works as a long-term memory system for LLM-based agents. The workflow consists of three simple steps:
This design allows LLM agents to maintain context, recall past information efficiently, and avoid repeatedly processing redundant history.
from main import SimpleMemSystem
# 🚀 Initialize system
system = SimpleMemSystem(clear_db=True)
# 💬 Add dialogues (Stage 1: Semantic Structured Compression)
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00")
# ✅ Finalize atomic encoding
system.finalize()
# 🔎 Query with intent-aware retrieval (Stage 3: Intent-Aware Retrieval Planning)
answer = system.ask("When and where will Alice and Bob meet?")
print(answer)
# Output: "16 November 2025 at 2:00 PM at Starbucks"
For large-scale dialogue processing, enable parallel mode:
system = SimpleMemSystem(
clear_db=True,
enable_parallel_processing=True, # ⚡ Parallel memory building
max_parallel_workers=8,
enable_parallel_retrieval=True, # 🔍 Parallel query execution
max_retrieval_workers=4
)
💡 Pro Tip: Parallel processing significantly reduces latency for batch operations!
If you encounter issues while setting up or running SimpleMem for the first time, check the following common cases:
config.pyOPENAI_BASE_URL is configured correctlypython --version
SimpleMem-Cross extends SimpleMem with persistent cross-conversation memory capabilities. Agents can recall context, decisions, and observations from previous sessions — enabling continuity across conversations without manual context re-injection.
| Feature | Description |
|---|---|
| Session Lifecycle | Full session management with start/record/stop/end lifecycle |
| Automatic Context Injection | Token-budgeted context from previous sessions injected at session start |
| Event Collection | Record messages, tool uses, file changes with automatic redaction |
| Observation Extraction | Heuristic extraction of decisions, discoveries, and learnings |
| Provenance Tracking | Every memory entry links back to source evidence |
| Consolidation | Decay, merge, and prune old memories to maintain quality |
from cross.orchestrator import create_orchestrator
async def main():
orch = create_orchestrator(project="my-project")
# Start session — previous context is injected automatically
result = await orch.start_session(
content_session_id="session-001",
user_prompt="Continue building the REST API",
)
print(result["context"]) # Relevant context from previous sessions
# Record events during the session
await orch.record_message(result["memory_session_id"], "User asked about JWT")
await orch.record_tool_use(
result["memory_session_id"],
tool_name="read_file",
tool_input="auth/jwt.py",
tool_output="class JWTHandler: ...",
)
# Finalize — extracts observations, generates summary, stores memories
report = await orch.stop_session(result["memory_session_id"])
print(f"Stored {report.entries_stored} memory entries")
await orch.end_session(result["memory_session_id"])
orch.close()
Agent Frameworks (Claude Code / Cursor / custom) | +--------------+--------------+ | | Hook/Lifecycle Adapter HTTP/MCP API (FastAPI) | | +--------------+--------------+ | CrossMemOrchestrator | +-----------------+------------------+ | | | Session Manager Context Injector Consolidation (SQLite) (budgeted bundle) (decay/merge/prune) | | | +---------+-------+ | | | Cross-Session Vector Store (LanceDB) <--+
| Module | Description |
|---|---|
cross/types.py | Pydantic models, enums, records |
cross/storage_sqlite.py | SQLite backend for sessions, events, observations |
cross/storage_lancedb.py | LanceDB vector store with provenance |
cross/hooks.py | Lifecycle hooks (SessionStart/ToolUse/End) |
cross/collectors.py | Event collection with 3-tier redaction |
cross/session_manager.py | Full session lifecycle orchestration |
cross/context_injector.py | Token-budgeted context builder |
cross/orchestrator.py | Top-level facade and factory |
cross/api_http.py | FastAPI REST endpoints |
cross/api_mcp.py | MCP tool definitions |
cross/consolidation.py | Memory maintenance worker |
📖 For detailed API documentation, see Cross-Session README
SimpleMem is available as a cloud-hosted memory service via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients.
🌐 Cloud Service: mcp.simplemem.cloud — or self-host the MCP server locally using Docker.
| Feature | Description |
|---|---|
| Streamable HTTP | MCP 2025-03-26 protocol with JSON-RPC 2.0 |
| Multi-tenant Isolation | Per-user data tables with token authentication |
| Hybrid Retrieval | Semantic search + keyword matching + metadata filtering |
| Production Optimized | Faster response times with OpenRouter integration |
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
📖 For detailed setup instructions and self-hosting guide, see MCP Documentation
# 🎯 Full LoCoMo benchmark
python test_locomo10.py
# 📉 Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5
# 💾 Custom output file
python test_locomo10.py --result-file my_results.json
Use the exact configurations in config.py:
If you use SimpleMem in your research, please cite:
@article{simplemem2025, title={SimpleMem: Efficient Lifelong Memory for LLM Agents}, author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu}, journal={arXiv preprint arXiv:2601.02553}, year={2025}, url={https://github.com/aiming-lab/SimpleMem} }
This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the following projects and teams: