claude-home/ollama-benchmark-results.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

5.1 KiB

title description type domain tags
Ollama Benchmark Results Scoring tables for local and cloud LLM models tested via Ollama across code generation, code analysis, reasoning, data analysis, and planning categories. reference development
ollama
llm
benchmarks
model-evaluation
deepseek
llama
glm

Ollama Model Benchmark Results

Summary Table

Model Code Gen Code Analysis Reasoning Data Analysis Planning Overall
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Detailed Results by Category

Code Generation

Simple Python Function

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Class with Error Handling

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Async API Handler

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Algorithm Challenge

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Analysis & Refactoring

Bug Finding

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Refactoring

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Explanation

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

General Reasoning

Logic Problem

Model Accuracy Reasoning Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

System Design

Model Accuracy Detail Level Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Technical Explanation

Model Accuracy Clarity Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Data Analysis

Data Processing

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Data Transformation

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Planning & Task Breakdown

Project Planning

Model Completeness Practicality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Debugging Strategy

Model Logical Flow Practicality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Key Findings

Strengths by Model

deepseek-coder-v2:lite

llama3.1:8b

glm-4.7:cloud

deepseek-v3.1:671b-cloud

Weaknesses by Model

deepseek-coder-v2:lite

llama3.1:8b

glm-4.7:cloud

deepseek-v3.1:671b-cloud

Best Model for Each Category

Category Winner Runner-up
Code Generation
Code Analysis
Reasoning
Data Analysis
Planning
Overall (Score)
Speed (if relevant)

Last Updated: YYYY-MM-DD