store: Switched embedding model from nomic-embed-text to qwen3-embedding:8b

This commit is contained in:
Cal Corum 2026-02-19 14:53:09 -06:00
parent 05cb6f8c40
commit e9494ea416

View File

@ -0,0 +1,12 @@
---
id: 8ff29b9a-fa48-42d3-97e0-c4aaaab8f164
type: decision
title: "Switched embedding model from nomic-embed-text to qwen3-embedding:8b"
tags: [cognitive-memory, ollama, embedding, decision, performance]
importance: 0.8
confidence: 0.8
created: "2026-02-19T20:53:09.200553+00:00"
updated: "2026-02-19T20:53:09.200553+00:00"
---
Upgraded Ollama embedding model from nomic-embed-text (137M params, 768d, F16, ~52 BEIR) to qwen3-embedding:8b (7.6B params, 4096d, Q4_K_M, ~70+ MTEB). Rationale: nomic is now low-to-mid tier; qwen3-embedding:8b is #1 on MTEB and beats all cloud options including OpenAI (64.6), Google Gemini (68.3), and Voyage AI (66-67). Runs locally on RTX 4080 SUPER (~5.7GB VRAM), completely private, zero API cost. Full re-embed of 430 memories takes ~27 seconds — comparable to nomic despite 55x more parameters, because embedding is a single forward pass and Q4_K_M quantization keeps it fast. _embeddings.json grows from ~5MB to ~23MB (4096 vs 768 dimensions). Config changed via `claude-memory config --ollama-model "qwen3-embedding:8b"`.