claude-home/ollama-benchmark-results.md
Cal Corum b186107b97 Add Ollama benchmark results and model testing notes
Document local LLM benchmark results, testing methodology, and
model comparison notes for Ollama deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 22:26:04 -06:00

4.7 KiB

Ollama Model Benchmark Results

Summary Table

Model Code Gen Code Analysis Reasoning Data Analysis Planning Overall
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Detailed Results by Category

Code Generation

Simple Python Function

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Class with Error Handling

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Async API Handler

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Algorithm Challenge

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Analysis & Refactoring

Bug Finding

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Refactoring

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Code Explanation

Model Accuracy Explanation Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

General Reasoning

Logic Problem

Model Accuracy Reasoning Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

System Design

Model Accuracy Detail Level Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Technical Explanation

Model Accuracy Clarity Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Data Analysis

Data Processing

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Data Transformation

Model Accuracy Code Quality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Planning & Task Breakdown

Project Planning

Model Completeness Practicality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Debugging Strategy

Model Logical Flow Practicality Response Time Notes
deepseek-coder-v2:lite
llama3.1:8b
glm-4.7:cloud
deepseek-v3.1:671b-cloud

Key Findings

Strengths by Model

deepseek-coder-v2:lite

llama3.1:8b

glm-4.7:cloud

deepseek-v3.1:671b-cloud

Weaknesses by Model

deepseek-coder-v2:lite

llama3.1:8b

glm-4.7:cloud

deepseek-v3.1:671b-cloud

Best Model for Each Category

Category Winner Runner-up
Code Generation
Code Analysis
Reasoning
Data Analysis
Planning
Overall (Score)
Speed (if relevant)

Last Updated: YYYY-MM-DD