From f1659b93f54e591729c7f74b5c00bb97ac07c287 Mon Sep 17 00:00:00 2001
From: Cal Corum <calcorum@users.noreply.github.com>
Date: Thu, 19 Feb 2026 16:03:16 -0600
Subject: [PATCH] store: Embeddings mtime-based cache: 6x faster semantic
 recall

---
 ...-cache-6x-faster-semantic-recall-f3790f.md | 27 +++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 graph/solutions/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md

diff --git a/graph/solutions/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md b/graph/solutions/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md
new file mode 100644
index 00000000000..f57f5d5f4dc
--- /dev/null
+++ b/graph/solutions/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md
@@ -0,0 +1,27 @@
+---
+id: f3790fff-ab75-44b2-8d8b-7dd0953c05dc
+type: solution
+title: "Embeddings mtime-based cache: 6x faster semantic recall"
+tags: [cognitive-memory, python, performance, caching]
+importance: 0.8
+confidence: 0.8
+created: "2026-02-19T22:03:16.251366+00:00"
+updated: "2026-02-19T22:03:16.251366+00:00"
+---
+
+Added mtime-based caching for `_embeddings.json` (24MB, 439 entries × 4096-dim vectors) in CognitiveMemoryClient.
+
+**Problem:** Every `semantic_recall()` call re-parsed the 24MB JSON file from disk.
+
+**Solution:** Added `_embeddings_cache` and `_embeddings_mtime` instance attributes. New `_load_embeddings_cached()` method does `stat()` to check mtime (nearly free), only re-parses when file has changed. Since the embed cron runs hourly, the parse happens at most once per hour.
+
+**Performance results:**
+| Call | Before | After |
+|------|--------|-------|
+| Semantic (cold) | 1,328ms | 389ms |
+| Semantic (warm) | 1,328ms | 208ms |
+| Keyword only | 3ms | 2ms |
+
+The remaining ~200ms on warm cache is the Ollama embedding API call for the query text + cosine similarity computation. The JSON parse overhead is completely eliminated on repeat calls.
+
+**Files changed:** `client.py` — added `_load_embeddings_cached()`, updated `semantic_recall()` to use it, added cache attrs to `__init__`.