68 lines
1.8 KiB
Markdown
68 lines
1.8 KiB
Markdown
---
|
|
title: "llama.cpp Installation and Setup"
|
|
description: "llama.cpp b8680 Vulkan build installation on workstation with RTX 4080 Super, including model download workflow."
|
|
type: reference
|
|
domain: workstation
|
|
tags: [llama-cpp, vulkan, nvidia, gguf, local-inference]
|
|
---
|
|
|
|
## Installation
|
|
|
|
Installed from pre-built release binary (no CUDA build available for Linux — Vulkan is the correct choice for NVIDIA GPUs):
|
|
|
|
```bash
|
|
# Extract to /opt
|
|
sudo mkdir -p /opt/llama.cpp
|
|
sudo tar -xzf llama-b8680-bin-ubuntu-vulkan-x64.tar.gz -C /opt/llama.cpp --strip-components=1
|
|
|
|
# Symlink all binaries to PATH
|
|
for bin in /opt/llama.cpp/llama-*; do
|
|
sudo ln -sf "$bin" /usr/local/bin/$(basename "$bin")
|
|
done
|
|
```
|
|
|
|
**Version**: b8680
|
|
**Backends loaded**: Vulkan (GPU), CPU (Zen4 for 7800X3D), RPC
|
|
**Source**: https://github.com/ggml-org/llama.cpp/releases
|
|
|
|
## Release Binary Options (Linux x64)
|
|
|
|
| Build | Use case |
|
|
|-------|----------|
|
|
| `ubuntu-x64` | CPU only |
|
|
| `ubuntu-vulkan-x64` | NVIDIA/AMD GPU via Vulkan |
|
|
| `ubuntu-rocm-x64` | AMD GPU via ROCm |
|
|
| `ubuntu-openvino-x64` | Intel CPU/GPU/NPU |
|
|
|
|
No pre-built CUDA binary exists — Vulkan is the NVIDIA option. For native CUDA, build from source with `-DGGML_CUDA=ON`.
|
|
|
|
## Models
|
|
|
|
Stored in `/home/cal/Models/`.
|
|
|
|
| Model | File | Size |
|
|
|-------|------|------|
|
|
| Qwen3.5-9B Q4_K_M | `Qwen3.5-9B-Q4_K_M.gguf` | 5.3 GB |
|
|
|
|
## Downloading Models
|
|
|
|
The built-in `-hf` downloader can stall. Use `curl` with resume support instead:
|
|
|
|
```bash
|
|
curl -L -C - --progress-bar \
|
|
-o /home/cal/Models/<model>.gguf \
|
|
"https://huggingface.co/<org>/<repo>/resolve/main/<model>.gguf"
|
|
```
|
|
|
|
`-C -` enables resume if the download is interrupted.
|
|
|
|
## Running
|
|
|
|
```bash
|
|
# Full GPU offload
|
|
llama-cli -m /home/cal/Models/Qwen3.5-9B-Q4_K_M.gguf -ngl 99
|
|
|
|
# Server mode
|
|
llama-server -m /home/cal/Models/Qwen3.5-9B-Q4_K_M.gguf -ngl 99 --port 8080
|
|
```
|