claude-memory/graph/solutions/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md

28 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
id: f3790fff-ab75-44b2-8d8b-7dd0953c05dc
type: solution
title: "Embeddings mtime-based cache: 6x faster semantic recall"
tags: [cognitive-memory, python, performance, caching]
importance: 0.8
confidence: 0.8
created: "2026-02-19T22:03:16.251366+00:00"
updated: "2026-02-19T22:03:16.251366+00:00"
---
Added mtime-based caching for `_embeddings.json` (24MB, 439 entries × 4096-dim vectors) in CognitiveMemoryClient.
**Problem:** Every `semantic_recall()` call re-parsed the 24MB JSON file from disk.
**Solution:** Added `_embeddings_cache` and `_embeddings_mtime` instance attributes. New `_load_embeddings_cached()` method does `stat()` to check mtime (nearly free), only re-parses when file has changed. Since the embed cron runs hourly, the parse happens at most once per hour.
**Performance results:**
| Call | Before | After |
|------|--------|-------|
| Semantic (cold) | 1,328ms | 389ms |
| Semantic (warm) | 1,328ms | 208ms |
| Keyword only | 3ms | 2ms |
The remaining ~200ms on warm cache is the Ollama embedding API call for the query text + cosine similarity computation. The JSON parse overhead is completely eliminated on repeat calls.
**Files changed:** `client.py` — added `_load_embeddings_cached()`, updated `semantic_recall()` to use it, added cache attrs to `__init__`.