store: Embeddings mtime-based cache: 6x faster semantic recall
This commit is contained in:
parent
0b1cc40123
commit
f1659b93f5
@ -0,0 +1,27 @@
|
||||
---
|
||||
id: f3790fff-ab75-44b2-8d8b-7dd0953c05dc
|
||||
type: solution
|
||||
title: "Embeddings mtime-based cache: 6x faster semantic recall"
|
||||
tags: [cognitive-memory, python, performance, caching]
|
||||
importance: 0.8
|
||||
confidence: 0.8
|
||||
created: "2026-02-19T22:03:16.251366+00:00"
|
||||
updated: "2026-02-19T22:03:16.251366+00:00"
|
||||
---
|
||||
|
||||
Added mtime-based caching for `_embeddings.json` (24MB, 439 entries × 4096-dim vectors) in CognitiveMemoryClient.
|
||||
|
||||
**Problem:** Every `semantic_recall()` call re-parsed the 24MB JSON file from disk.
|
||||
|
||||
**Solution:** Added `_embeddings_cache` and `_embeddings_mtime` instance attributes. New `_load_embeddings_cached()` method does `stat()` to check mtime (nearly free), only re-parses when file has changed. Since the embed cron runs hourly, the parse happens at most once per hour.
|
||||
|
||||
**Performance results:**
|
||||
| Call | Before | After |
|
||||
|------|--------|-------|
|
||||
| Semantic (cold) | 1,328ms | 389ms |
|
||||
| Semantic (warm) | 1,328ms | 208ms |
|
||||
| Keyword only | 3ms | 2ms |
|
||||
|
||||
The remaining ~200ms on warm cache is the Ollama embedding API call for the query text + cosine similarity computation. The JSON parse overhead is completely eliminated on repeat calls.
|
||||
|
||||
**Files changed:** `client.py` — added `_load_embeddings_cached()`, updated `semantic_recall()` to use it, added cache attrs to `__init__`.
|
||||
Loading…
Reference in New Issue
Block a user