--- title: "Ollama Benchmark Results" description: "Scoring tables for local and cloud LLM models tested via Ollama across code generation, code analysis, reasoning, data analysis, and planning categories." type: reference domain: development tags: [ollama, llm, benchmarks, model-evaluation, deepseek, llama, glm] --- # Ollama Model Benchmark Results ## Summary Table | Model | Code Gen | Code Analysis | Reasoning | Data Analysis | Planning | Overall | |-------|----------|--------------|-----------|--------------|----------|---------| | deepseek-coder-v2:lite | | | | | | | | llama3.1:8b | | | | | | | | glm-4.7:cloud | | | | | | | | deepseek-v3.1:671b-cloud | | | | | | | --- ## Detailed Results by Category ### Code Generation **Simple Python Function** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Class with Error Handling** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Async API Handler** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Algorithm Challenge** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | ### Code Analysis & Refactoring **Bug Finding** | Model | Accuracy | Explanation Quality | Response Time | Notes | |-------|----------|---------------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Code Refactoring** | Model | Accuracy | Explanation Quality | Response Time | Notes | |-------|----------|---------------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Code Explanation** | Model | Accuracy | Explanation Quality | Response Time | Notes | |-------|----------|---------------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | ### General Reasoning **Logic Problem** | Model | Accuracy | Reasoning Quality | Response Time | Notes | |-------|----------|-------------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **System Design** | Model | Accuracy | Detail Level | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Technical Explanation** | Model | Accuracy | Clarity | Response Time | Notes | |-------|----------|---------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | ### Data Analysis **Data Processing** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Data Transformation** | Model | Accuracy | Code Quality | Response Time | Notes | |-------|----------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | ### Planning & Task Breakdown **Project Planning** | Model | Completeness | Practicality | Response Time | Notes | |-------|--------------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | **Debugging Strategy** | Model | Logical Flow | Practicality | Response Time | Notes | |-------|--------------|--------------|---------------|-------| | deepseek-coder-v2:lite | | | | | | llama3.1:8b | | | | | | glm-4.7:cloud | | | | | | deepseek-v3.1:671b-cloud | | | | | --- ## Key Findings ### Strengths by Model **deepseek-coder-v2:lite** - **llama3.1:8b** - **glm-4.7:cloud** - **deepseek-v3.1:671b-cloud** - ### Weaknesses by Model **deepseek-coder-v2:lite** - **llama3.1:8b** - **glm-4.7:cloud** - **deepseek-v3.1:671b-cloud** - ### Best Model for Each Category | Category | Winner | Runner-up | |----------|--------|-----------| | Code Generation | | | | Code Analysis | | | | Reasoning | | | | Data Analysis | | | | Planning | | | | Overall (Score) | | | | Speed (if relevant) | | | --- *Last Updated: YYYY-MM-DD*