1.3 KiB
| id | type | title | tags | importance | confidence | created | updated | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| f3790fff-ab75-44b2-8d8b-7dd0953c05dc | solution | Embeddings mtime-based cache: 6x faster semantic recall |
|
0.8 | 0.8 | 2026-02-19T22:03:16.251366+00:00 | 2026-02-19T22:03:16.251366+00:00 |
Added mtime-based caching for _embeddings.json (24MB, 439 entries × 4096-dim vectors) in CognitiveMemoryClient.
Problem: Every semantic_recall() call re-parsed the 24MB JSON file from disk.
Solution: Added _embeddings_cache and _embeddings_mtime instance attributes. New _load_embeddings_cached() method does stat() to check mtime (nearly free), only re-parses when file has changed. Since the embed cron runs hourly, the parse happens at most once per hour.
Performance results:
| Call | Before | After |
|---|---|---|
| Semantic (cold) | 1,328ms | 389ms |
| Semantic (warm) | 1,328ms | 208ms |
| Keyword only | 3ms | 2ms |
The remaining ~200ms on warm cache is the Ollama embedding API call for the query text + cosine similarity computation. The JSON parse overhead is completely eliminated on repeat calls.
Files changed: client.py — added _load_embeddings_cached(), updated semantic_recall() to use it, added cache attrs to __init__.