Benchmark script for llama.cpp & results for AMD RX 7900 XTX - using Vulcan

Shell 100%

Find a file

Mike Key 6d88857a22 Update for ReBAR/coopmat fixes - 4x performance improvement		2025-12-11 14:50:02 -07:00
benchmark-results	Update for ReBAR/coopmat fixes - 4x performance improvement	2025-12-11 14:50:02 -07:00
benchmark-vulkan.sh	Update for ReBAR/coopmat fixes - 4x performance improvement	2025-12-11 14:50:02 -07:00
LICENSE	init commit	2025-12-11 09:15:46 -07:00
README.md	Update for ReBAR/coopmat fixes - 4x performance improvement	2025-12-11 14:50:02 -07:00

README.md

AMD RX 7900 XTX Vulkan Benchmarks for llama.cpp

Benchmark results for dual AMD Radeon RX 7900 XTX GPUs using the Vulkan backend with llama.cpp.

Results formatted for the Vulkan Scoreboard discussion #10879.

Latest Results (2025-12-11)

Single GPU (Compute Card)

model	size	params	backend	ngl	n_batch	fa	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	100	512	1	pp512	3290.92 ± 37.67
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	100	512	1	tg128	172.86 ± 0.23

Dual GPU

model	size	params	backend	ngl	n_batch	fa	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	100	512	1	pp512	2546.13 ± 18.25
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	100	512	1	tg128	129.80 ± 0.34

Note: Dual GPU is slower than single GPU for small models due to PCIe transfer overhead. Dual GPU shines with larger models (70B+) that benefit from combined VRAM.

Build: a81a56957 (7361)

Optimal Settings for RDNA3

As of late 2025, the recommended settings for 7900 XTX are:

# Coopmat works now - no need to disable it
GGML_VK_VISIBLE_DEVICES=1 llama-bench -m model.gguf -ngl 100 -fa 1 -b 512

Setting	Value	Notes
Coopmat	Enabled	`KHR_coopmat` now works correctly on RDNA3
Flash Attention	`-fa 1`	~5% improvement on token generation
Batch Size	`-b 512`	Optimal for 7900 XTX

Historical Note

Earlier in 2025, RDNA3 required workarounds (GGML_VK_DISABLE_COOPMAT=1, -b 256) due to driver/llama.cpp issues. These are no longer needed with Mesa 25.3+ and recent llama.cpp builds.

Critical: BIOS & Hardware Setup

Getting full performance from 7900 XTX requires proper BIOS configuration. Without these, you may see 50-75% lower performance.

1. Enable ReBAR (Resizable BAR)

In BIOS:

Settings → IO Ports → Above 4G Decoding = Enabled
Settings → IO Ports → Re-Size BAR Support = Enabled

Verify with:

lspci -v -s <GPU_BUS_ID> | grep -i size
# Should show "size=32G" not "size=256M"

2. Use Direct CPU PCIe Lanes

Not all PCIe slots are equal. On X570/X670 motherboards:

Top slot = Direct CPU lanes (x16 @ 16GT/s) ← Use this for compute
Bottom slots = Often routed through chipset (much slower)

Verify with:

sudo lspci -vvv -s <GPU_BUS_ID> | grep -iE "LnkCap:|LnkSta:"
# Should show: Speed 16GT/s, Width x16

Performance Impact

Configuration	pp512	tg128
ReBAR off, bad PCIe	~760	~86
ReBAR on, x16 lanes	~3290	~173

That's 4x prompt processing and 2x token generation!

System Configuration

Component	Details
CPU	AMD Ryzen 7 5800X
Motherboard	Gigabyte X570 AORUS XTREME
GPUs	2x AMD Radeon RX 7900 XTX (24GB each)
OS	Arch Linux 6.17.9-zen1-1-zen
Vulkan Driver	RADV (Mesa 25.3.1-arch1.2)
ReBAR	Enabled (32GB BAR)

Device Info:

AMD Radeon RX 7900 XTX (RADV NAVI31) | fp16: 1 | warp size: 64 | int dot: 1 | matrix cores: KHR_coopmat

GPU Slot Configuration

Slot	PCI Bus	PCIe Link	Role
Top (slot 1)	0f:00.0	x16 @ 16GT/s (CPU direct)	Compute (headless)
Bottom (slot 3)	06:00.0	x16 @ 16GT/s (chipset)	Display

Usage

Quick Start

./benchmark-vulkan.sh

With 70B Model

MODEL_70B=/path/to/70B-model.gguf ./benchmark-vulkan.sh

Manual Benchmark

# Single GPU (use GPU 1 for compute if display is on GPU 0)
GGML_VK_VISIBLE_DEVICES=1 llama-bench \
  -m /path/to/model.gguf -ngl 100 -fa 1 -b 512

# Dual GPU
GGML_VK_VISIBLE_DEVICES=0,1 llama-bench \
  -m /path/to/model.gguf -ngl 100 -fa 1 -b 512

Check GPU Mapping

vulkaninfo 2>/dev/null | grep -E "(deviceName|pciBus)"

Troubleshooting

Low performance (~500-1000 t/s instead of ~3000+ t/s)

Check ReBAR: lspci -v -s <BUS> | grep size should show 32G
Check PCIe link: sudo lspci -vvv -s <BUS> | grep LnkSta should show x16 @ 16GT/s
Check power state: cat /sys/class/drm/card*/device/power_state - D0 = awake, D3hot = asleep

Wake sleeping GPU

sudo bash -c 'echo on > /sys/class/drm/card0/device/power/control'

Building Vulkan Backend

cd llama.cpp
cmake -B build-vulkan -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build-vulkan --config Release -j $(nproc)

License

MIT License