All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
193 lines
5.1 KiB
Markdown
193 lines
5.1 KiB
Markdown
---
|
|
title: "Ollama Benchmark Results"
|
|
description: "Scoring tables for local and cloud LLM models tested via Ollama across code generation, code analysis, reasoning, data analysis, and planning categories."
|
|
type: reference
|
|
domain: development
|
|
tags: [ollama, llm, benchmarks, model-evaluation, deepseek, llama, glm]
|
|
---
|
|
|
|
# Ollama Model Benchmark Results
|
|
|
|
## Summary Table
|
|
|
|
| Model | Code Gen | Code Analysis | Reasoning | Data Analysis | Planning | Overall |
|
|
|-------|----------|--------------|-----------|--------------|----------|---------|
|
|
| deepseek-coder-v2:lite | | | | | | |
|
|
| llama3.1:8b | | | | | | |
|
|
| glm-4.7:cloud | | | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | | | |
|
|
|
|
---
|
|
|
|
## Detailed Results by Category
|
|
|
|
### Code Generation
|
|
|
|
**Simple Python Function**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Class with Error Handling**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Async API Handler**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Algorithm Challenge**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
### Code Analysis & Refactoring
|
|
|
|
**Bug Finding**
|
|
| Model | Accuracy | Explanation Quality | Response Time | Notes |
|
|
|-------|----------|---------------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Code Refactoring**
|
|
| Model | Accuracy | Explanation Quality | Response Time | Notes |
|
|
|-------|----------|---------------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Code Explanation**
|
|
| Model | Accuracy | Explanation Quality | Response Time | Notes |
|
|
|-------|----------|---------------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
### General Reasoning
|
|
|
|
**Logic Problem**
|
|
| Model | Accuracy | Reasoning Quality | Response Time | Notes |
|
|
|-------|----------|-------------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**System Design**
|
|
| Model | Accuracy | Detail Level | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Technical Explanation**
|
|
| Model | Accuracy | Clarity | Response Time | Notes |
|
|
|-------|----------|---------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
### Data Analysis
|
|
|
|
**Data Processing**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Data Transformation**
|
|
| Model | Accuracy | Code Quality | Response Time | Notes |
|
|
|-------|----------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
### Planning & Task Breakdown
|
|
|
|
**Project Planning**
|
|
| Model | Completeness | Practicality | Response Time | Notes |
|
|
|-------|--------------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
**Debugging Strategy**
|
|
| Model | Logical Flow | Practicality | Response Time | Notes |
|
|
|-------|--------------|--------------|---------------|-------|
|
|
| deepseek-coder-v2:lite | | | | |
|
|
| llama3.1:8b | | | | |
|
|
| glm-4.7:cloud | | | | |
|
|
| deepseek-v3.1:671b-cloud | | | | |
|
|
|
|
---
|
|
|
|
## Key Findings
|
|
|
|
### Strengths by Model
|
|
|
|
**deepseek-coder-v2:lite**
|
|
-
|
|
|
|
**llama3.1:8b**
|
|
-
|
|
|
|
**glm-4.7:cloud**
|
|
-
|
|
|
|
**deepseek-v3.1:671b-cloud**
|
|
-
|
|
|
|
### Weaknesses by Model
|
|
|
|
**deepseek-coder-v2:lite**
|
|
-
|
|
|
|
**llama3.1:8b**
|
|
-
|
|
|
|
**glm-4.7:cloud**
|
|
-
|
|
|
|
**deepseek-v3.1:671b-cloud**
|
|
-
|
|
|
|
### Best Model for Each Category
|
|
|
|
| Category | Winner | Runner-up |
|
|
|----------|--------|-----------|
|
|
| Code Generation | | |
|
|
| Code Analysis | | |
|
|
| Reasoning | | |
|
|
| Data Analysis | | |
|
|
| Planning | | |
|
|
| Overall (Score) | | |
|
|
| Speed (if relevant) | | |
|
|
|
|
---
|
|
|
|
*Last Updated: YYYY-MM-DD*
|