AI & Tech·Jun 3, 2026

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Article URL: Comments URL: Points: 4 # Comments: 0

Hacker News5 min readSingle source
Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)
Image · Hacker News
The gist
5-point summary · 1 min

Article URL: Comments URL: Points: 4 # Comments: 0

  • Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required.
  • It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API.
  • It ships as a single static binary with zero cloud dependency.
  • On POST /retrieve, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a string.
  • See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider.
In this article
AAPL· Apple
Loading…
Yahoo Finance

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. What is mnemo? Most LLMs forget everything the moment a conversation ends. mnemo fixes that. mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency. How it works your app │ ▼ POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph) │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ ──► inject into your LLM prompt You POST raw text to /ingest (a conversation turn, a document, a note). mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them. Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. On POST /retrieve, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a string. You inject into your LLM's system prompt. Done. Quickstart Path A — Docker + Ollama (fully free, recommended) git clone https://github.com/zaydmulani09/mnemo cd mnemo docker compose up -d # Pull the llama3 model the first time (~4 GB) docker exec mnemo-ollama ollama pull llama3 # Verify everything is healthy curl http://localhost:8080/health Path B — Binary (Ollama or OpenAI running separately) cargo install --path crates/mnemo-api # With Ollama export =http://localhost:11434/v1 mnemo-api # With OpenAI export =https://api.openai.com/v1 export =sk-... export =gpt-4o-mini export =openai mnemo-api Path C — Python SDK pip install mnemo-sdk from mnemo import MnemoClient client = MnemoClient() # server at http://localhost:8080 # Store a memory client.ingest("I'm building a Rust vector database called vecdb") # Get context for injection into your next LLM prompt print(client.get_context("what am I working on?")) API Reference All endpoints accept and return application/json. Base URL: http://localhost:8080. Method Path Description Request body Response GET /health Server + DB + LLM status — HealthResponse POST /ingest Store text, extract entities IngestRequest IngestResponse POST /retrieve Retrieve ranked memory context RetrievalQuery RetrievalResult GET /entities List entities (paginated)?limit&offset Entity[] GET /entities/:id Get entity by UUID — Entity DELETE /entities/:id Delete entity (cascades) — {"deleted":true} GET /entities/:id/neighbors Knowledge graph neighbors?depth (max 5) GraphNode[] GET /chunks List memory chunks (paginated)?limit&offset& MemoryChunk[] GET /chunks/:id Get chunk by UUID — MemoryChunk DELETE /chunks/:id Delete chunk — {"deleted":true} POST /search Full-text search entities + chunks {"query","limit"} {"entities","chunks"} DELETE /wipe Delete all memory (irreversible) header: X-Confirm-Wipe: true {"wiped":true} GET /stats Entity/chunk/graph counts + uptime — StatsResponse Key request/response types: Full endpoint documentation with curl examples: docs/api.md Configuration Environment variables Variable Default Description mnemo.db SQLite database file path 8080 API server port http://localhost:11434/v1 OpenAI-compatible LLM base URL llama3 Model name for entity extraction ollama API key (any value works for Ollama) ollama Provider type: ollama, openai, anthropic, custom TOML config file Pass --config path/to/config.toml to mnemo-api. See mnemo.example.toml: db_path = "mnemo.db" port = 8080 [llm] provider = "ollama" base_url = "http://localhost:11434/v1" model = "llama3" api_key = "ollama" = 30 max_retries = 3 max_tokens = 2048 temperature = 0.1 Environment variables take precedence over TOML values. The active config source is reported in GET /health →. CLI Install: cargo install --path crates/mnemo-cli Usage: # Store a memory mnemo ingest "I use Neovim and prefer dark mode" # Retrieve relevant context mnemo search "what editor do I use?" # List all extracted entities mnemo entities # Show entity detail + graph neighbors mnemo entity --neighbors # List memory chunks mnemo chunks # Server health mnemo health # Memory statistics mnemo stats # Delete everything (prompts for confirmation) mnemo wipe # Skip confirmation prompt mnemo wipe --yes # Point at a non-default server mnemo --server http://192.168.1.10:8080 stats Python SDK Install: pip install mnemo-sdk See sdk/python/README.md for the full API reference. Async example: import asyncio from mnemo import AsyncMnemoClient async def main(): async with AsyncMnemoClient() as client: await client.ingest( "Alice is a principal engineer at Stripe working on payment infrastructure.", ="session-001", ) context = await client.get_context( "what does Alice work on?", ="session-001", ) print(context) asyncio.run(main()) A working standalone example: examples/.py Architecture Four Rust crates wired together: Crate Type Role mnemo-core lib Entity extraction, graph ops, retrieval engine, DB layer mnemo-api bin Axum REST API — thin handler layer over mnemo-core mnemo-cli bin CLI tool using blocking reqwest against the API mnemo-bench bin Performance benchmarks (12 suites) Full architecture documentation: docs/architecture.md Performance Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release) is 3–5× faster. Operation Avg latency Throughput Entity insert (SQLite) ~0.12 ms ~8,300 ops/s Entity lookup by ID ~0.08 ms ~12,500 ops/s Chunk insert ~0.14 ms ~7,100 ops/s Full-text chunk search ~0.28 ms ~3,500 ops/s Graph neighbor (depth=1) ~0.21 ms ~4,700 ops/s Graph neighbor (depth=2) ~0.89 ms ~1,100 ops/s Full retrieval pipeline ~4.2 ms ~238 ops/s Run cargo run -p mnemo-bench to benchmark on your hardware. Testing Rust cargo test --workspace # run all 122 tests make coverage # HTML coverage report (requires cargo-llvm-cov) make coverage-summary # summary to stdout Python SDK cd sdk/python && pytest tests/ -v Benchmarks cargo run -p mnemo-bench # all 12 benchmarks cargo run -p mnemo-bench -- --filter graph # graph benchmarks only cargo run -p mnemo-bench -- --json out.json # save results to JSON Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks Contributing PRs welcome. Please run make fmt && make lint before submitting. Open an issue first for large changes. See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider. License MIT — see LICENSE

Integrity note  ·  Xela does not rewrite or paraphrase article content. The excerpt above is the source publication's own words, sanitized for display. For the full piece — including any quotes, charts, or images — read it at Hacker News. Xela's rewritten version is off for this story, so there's no editorial angle attached — you're getting the source's reporting unfiltered. When the rewrite is on, we add a What this means block underneath with the operator/trader takeaway.

What people are saying

Discussion

Hot takes

0/280

Loading takes…

Comments

Discussion · 0

Sign in to comment, like, and save articles.

Sign in

Loading comments…

Newsletter

Track ai & tech every morning.

Daily digest tuned to this beat. The 5 stories most worth your time. Unsubscribe anytime.