claude-home/ollama-model-testing.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

3.5 KiB

title description type domain tags
Ollama Model Testing Log Testing log tracking Ollama model evaluations with performance observations, VRAM requirements, and suitability ratings for different use cases on a 16GB GPU workstation. reference development
ollama
llm
model-testing
vram
gpu
deepseek
glm

Ollama Model Testing Log

Track models tested, performance observations, and suitability for different use cases.


Quick Summary

Model Date Tested Primary Use Case Rating Notes
GLM-4.7:cloud 2026-02-04 General purpose Cloud-hosted, fast, good reasoning
deepseek-v3.1:671b-cloud 2026-02-04 Complex reasoning Cloud, very capable, slower response

Model Testing Details

GLM-4.7:cloud

Date Tested: 2026-02-04

Model Info:

  • Size/Parameters: Unknown (cloud)
  • Quantization: N/A (cloud)
  • Base Model: GLM-4.7 by Zhipu AI

Performance:

  • Response Speed: Fast
  • RAM/VRAM Usage: Cloud (local minimal)
  • Context Window: 128k

Testing Use Cases:

  • Code generation
  • General Q&A
  • Creative writing
  • Data analysis
  • Task planning
  • Other:

Observations:

  • Strengths: Fast response, good at general reasoning
  • Weaknesses: Cloud dependency
  • Resource requirements: Minimal local resources
  • Output quality: Solid for most tasks
  • When to use this model: Daily tasks, coding help, general assistance

Verdict:


deepseek-v3.1:671b-cloud

Date Tested: 2026-02-04

Model Info:

  • Size/Parameters: 671B (cloud)
  • Quantization: N/A (cloud)
  • Base Model: DeepSeek-V3 by DeepSeek

Performance:

  • Response Speed: Moderate (671B model)
  • RAM/VRAM Usage: Cloud (local minimal)
  • Context Window: 128k+

Testing Use Cases:

  • Code generation
  • General Q&A
  • Creative writing
  • Data analysis
  • Task planning
  • Other:

Observations:

  • Strengths: Very capable, excellent reasoning, great with complex tasks
  • Weaknesses: Slower response, cloud dependency
  • Resource requirements: Minimal local resources
  • Output quality: Top-tier, handles complex multi-step reasoning well
  • When to use this model: Complex coding tasks, deep analysis, planning

Verdict:


Models to Test

Local Models (16GB GPU Compatible)

Small & Fast (2-6GB VRAM at Q4):

  • phi3:mini - 3.8B params, great for quick tasks ~2.2GB
  • llama3.1:8b - 8B params, excellent all-rounder ~4.7GB
  • qwen2.5:7b - 7B params, strong reasoning ~4.3GB
  • gemma2:9b - 9B params, Google's small model ~5.5GB

Medium Capability (6-10GB VRAM at Q4):

  • mistral:7b - 7B params, classic workhorse ~4.1GB
  • llama3.1:14b - 14B params, higher quality ~8.2GB
  • qwen2.5:14b - 14B params, strong multilingual ~8.1GB

Specialized:

  • deepseek-coder-v2:lite - 16B params, optimized for coding ~8.7GB
  • codellama:7b - 7B params, coding specialist ~4.1GB

General Notes

Any overall observations, preferences, or patterns discovered during testing.

Initial Impressions:

  • Cloud models (GLM-4.7, DeepSeek-V3) provide excellent quality without local resources
  • Planning to test local models for privacy, offline use, and comparing quality/speed trade-offs
  • Focus will be on models that fit comfortably in 16GB VRAM for smooth performance

VRAM Estimates at Q4 Quantization:

  • 3B-4B models: ~2-3GB
  • 7B-8B models: ~4-5GB
  • 14B models: ~8-9GB
  • Leaves room for context window and system overhead

Last Updated: 2026-02-04