Production-Ready Memory Service for LLM Agents via Model Context Protocol (MCP)
SimpleMem MCP Server is a cloud-hosted long-term memory service for LLM agents, implementing the Streamable HTTP transport (MCP 2025-03-26 spec). It enables AI assistants like Claude, Cursor, and other MCP-compatible clients to store, retrieve, and query conversational memories with ease.
┌─────────────────────────────────────────────────────────────────┐ │ SimpleMem MCP Server │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ HTTP Server (FastAPI) │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │ │ │ │ Web UI │ │ REST API │ │ MCP Streamable │ │ │ │ │ │ (/) │ │ (/api/*) │ │ HTTP (/mcp) │ │ │ │ │ └────────────┘ └────────────┘ └────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Token Authentication │ │ │ │ (JWT + AES-256 Encrypted API Keys) │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────┼────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ User A │ │ User B │ │ User C │ │ │ │ Table │ │ Table │ │ Table │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ └─────────────────── LanceDB ──────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ OpenRouter API Integration │ │ │ │ LLM: openai/gpt-4.1-mini Embed: qwen/qwen3-embed-4b │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
The easiest way to use SimpleMem is via our hosted service at https://mcp.simplemem.cloud
https://mcp.simplemem.cloudcd MCP
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Production environment recommended settings
export JWT_SECRET_KEY="your-secure-random-secret-key"
export ENCRYPTION_KEY="your-32-byte-encryption-key!!"
python run.py
Output:
============================================================ SimpleMem MCP Server Multi-tenant Memory Service for LLM Agents ============================================================ Web UI: http://localhost:8000/ REST API: http://localhost:8000/api/ MCP: http://localhost:8000/mcp ------------------------------------------------------------
| Item | Value |
|---|---|
| Protocol Version | 2025-03-26 |
| Transport | Streamable HTTP |
| Message Format | JSON-RPC 2.0 |
| Authentication | Bearer Token |
| Endpoint | Method | Description |
|---|---|---|
/mcp | POST | Send JSON-RPC messages (requests, notifications) |
/mcp | GET | Server-to-client SSE stream |
/mcp | DELETE | Terminate session |
All MCP requests require a Bearer token in the Authorization header:
Authorization: Bearer <your-token>
After initialization, include the session ID header:
Mcp-Session-Id: <session-id>
| Tool | Description |
|---|---|
memory_add | Add a single dialogue to memory (auto-extracts facts, resolves pronouns, anchors timestamps) |
memory_add_batch | Add multiple dialogues at once |
memory_query | Query memories and generate AI-synthesized answers (with planning + hybrid retrieval + reflection) |
memory_retrieve | Retrieve relevant memory entries (returns raw data) |
memory_stats | Get memory statistics |
memory_clear | Clear all memories (irreversible) |
Add to your MCP JSON settings:
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Dialogue Input Processing Memory Storage ─────────────────────────────────────────────────────────────────────────── "I'll meet Bob ┌─────────────────┐ at Starbucks │ LLM Processing │ tomorrow at 3pm" ──▶│ │ ──────────────▶ Atomic Fact └─────────────────┘ │ ▼ ┌─────────────────────────┐ │ Atomic Fact: │ │ "User will meet Bob at │ │ Starbucks on │ │ 2025-01-15 at 15:00" │ │ │ │ persons: [User, Bob] │ │ location: Starbucks │ │ timestamp: 2025-01-15 │ │ topic: Meeting │ └───────────┬─────────────┘ │ ▼ ┌─────────────────────────┐ │ Embedding │ │ (qwen3-embed-4b) │ └───────────┬─────────────┘ │ ▼ ┌─────────────────────────┐ │ LanceDB Vector Store │ └─────────────────────────┘
User Question: "When am I meeting Bob?" │ ▼ ┌───────────────────────────────┐ │ 1. Query Complexity Analysis │ │ - Type: Temporal query │ │ - Entity: Bob │ │ - Complexity: 0.3 (simple)│ └───────────────┬───────────────┘ │ ▼ ┌───────────────────────────────┐ │ 2. Generate Search Queries │ │ → "Bob meeting time" │ └───────────────┬───────────────┘ │ ▼ ┌───────────────────────────────┐ │ 3. Hybrid Retrieval │ │ - Semantic (vector) │ │ - Keyword (BM25) │ │ - Metadata (persons) │ └───────────────┬───────────────┘ │ ▼ ┌───────────────────────────────┐ │ 4. Answer Generation │ │ Context + Question → LLM │ └───────────────┬───────────────┘ │ ▼ ┌───────────────────────────────┐ │ Response: │ │ { │ │ "answer": "15 January 2025 │ │ at 3:00 PM at │ │ Starbucks", │ │ "confidence": "high", │ │ "contexts_used": 1 │ │ } │ └───────────────────────────────┘
| Option | Default | Description |
|---|---|---|
window_size | 20 | Number of dialogues per processing batch |
semantic_top_k | 25 | Semantic search result count |
keyword_top_k | 5 | Keyword search result count |
enable_planning | true | Enable query planning |
enable_reflection | true | Enable reflection iteration |
max_reflection_rounds | 2 | Maximum reflection rounds |
llm_model | openai/gpt-4.1-mini | LLM model |
embedding_model | qwen/qwen3-embedding-4b | Embedding model |
# Development mode (auto-reload)
python run.py --reload
# Specify port
python run.py --port 3000
# View help
python run.py --help
MIT License
Built upon SimpleMem research implementation, refactored and optimized for production deployment with multi-tenant support, faster processing, and comprehensive user isolation.