claude-memory/embeddings-mtime-based-cache-6x-faster-semantic-recall-f3790f.md at 43eb2fc95d7ec2e23d3f877fbfb3207ecdd7b00a

cal/claude-memory

Fork 0

Cal Corum f1659b93f5 store: Embeddings mtime-based cache: 6x faster semantic recall

2026-02-19 16:03:16 -06:00

1.3 KiB

Raw Blame History

type

title

tags

importance

confidence

created

updated

f3790fff-ab75-44b2-8d8b-7dd0953c05dc

solution

Embeddings mtime-based cache: 6x faster semantic recall

cognitive-memory

python

performance

caching

0.8

2026-02-19T22:03:16.251366+00:00

Added mtime-based caching for _embeddings.json (24MB, 439 entries × 4096-dim vectors) in CognitiveMemoryClient.

Problem: Every semantic_recall() call re-parsed the 24MB JSON file from disk.

Solution: Added _embeddings_cache and _embeddings_mtime instance attributes. New _load_embeddings_cached() method does stat() to check mtime (nearly free), only re-parses when file has changed. Since the embed cron runs hourly, the parse happens at most once per hour.

Performance results:

Call	Before	After
Semantic (cold)	1,328ms	389ms
Semantic (warm)	1,328ms	208ms
Keyword only	3ms	2ms

The remaining ~200ms on warm cache is the Ollama embedding API call for the query text + cosine similarity computation. The JSON parse overhead is completely eliminated on repeat calls.

Files changed: client.py — added _load_embeddings_cached(), updated semantic_recall() to use it, added cache attrs to __init__.

1.3 KiB Raw Blame History Unescape Escape

1.3 KiB

Raw Blame History