1.0 KiB
1.0 KiB
| id | type | title | tags | importance | confidence | created | updated | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8ff29b9a-fa48-42d3-97e0-c4aaaab8f164 | decision | Switched embedding model from nomic-embed-text to qwen3-embedding:8b |
|
0.8 | 0.8 | 2026-02-19T20:53:09.200553+00:00 | 2026-02-19T20:53:09.200553+00:00 |
Upgraded Ollama embedding model from nomic-embed-text (137M params, 768d, F16, ~52 BEIR) to qwen3-embedding:8b (7.6B params, 4096d, Q4_K_M, ~70+ MTEB). Rationale: nomic is now low-to-mid tier; qwen3-embedding:8b is #1 on MTEB and beats all cloud options including OpenAI (64.6), Google Gemini (68.3), and Voyage AI (66-67). Runs locally on RTX 4080 SUPER (~5.7GB VRAM), completely private, zero API cost. Full re-embed of 430 memories takes ~27 seconds — comparable to nomic despite 55x more parameters, because embedding is a single forward pass and Q4_K_M quantization keeps it fast. _embeddings.json grows from ~5MB to ~23MB (4096 vs 768 dimensions). Config changed via claude-memory config --ollama-model "qwen3-embedding:8b".