AMD discrete GPU

hal0 supports AMD discrete GPUs through two paths: Vulkan (the default) and ROCm (opt-in via the rocm toolbox). Both toolbox images live at ghcr.io/hal0ai/hal0-toolbox-vulkan and ghcr.io/hal0ai/hal0-toolbox-rocm with pinned digests.

Where this fits

AMD discrete is a supported target, not the reference platform. The Strix Halo page covers the iGPU and unified-memory story; this page is for desktop and workstation cards, where you trade unified memory for raw VRAM bandwidth.

Tier-1 cards

Radeon RX 7900 XTX (24 GB GDDR6)
Radeon RX 7900 XT (20 GB GDDR6)

Both run Q4 7B–14B class models comfortably with room for a small embed slot. Q4 30B-A3B MoE works with shorter context. Q4 70B needs partial CPU offload.

Tier-2 cards

Radeon RX 7800 XT (16 GB GDDR6)
Radeon RX 7700 XT (12 GB GDDR6)
Radeon PRO W7800 / W7900 (32 / 48 GB GDDR6, workstation parts)

The 16 GB and below cards run one slot at a time as a rule: chat-only with a Q4 7B–13B, or a smaller chat model alongside an embed.

Vulkan path (default)

The Vulkan toolbox is the same one Strix Halo uses. No ROCm headers required, no kernel modules beyond amdgpu, no version-pinning hell. The hardware probe picks Vulkan automatically when it sees an AMD discrete GPU.

The trade is throughput. Vulkan llama.cpp on AMD discrete is solid but typically lags ROCm builds by a meaningful margin on chat throughput. For a daily-driver chat box it’s fine; for raw benchmarks you’ll want ROCm.

ROCm path (opt-in)

The rocm toolbox is published. Opting in is one slot config change:

hal0 slot swap primary --provider llama-cpp-rocm

The slot lifecycle handles the toolbox swap atomically: unloading → starting → warming → ready. ROCm needs /dev/kfd present in the container; without it the provider falls back to CPU.

Recommended loadouts

These are repeated from the Strix Halo loadouts page with the discrete-card framing.

RX 7900 XTX / XT (20–24 GB VRAM)

primary: Qwen3-30B-A3B-Instruct-2507-Q4_K_M (~18.6 GB) fits with shorter context, or gemma-3-12b-it-Q4_K_M (~6.6 GB) for a longer window.
embed: small Q4 embed only (nomic-embed-text-v2-moe ~140 MB).
Q4 70B requires partial CPU offload. Works, but drops well below VRAM-resident speeds.

RX 7800 XT / 7700 XT (12–16 GB VRAM)

primary: Hermes-4-14B-Q4_K_M (~9 GB) or gemma-3-12b-it-Q4_K_M (~6.6 GB) with several GB for context.
embed: nomic-embed-text-v2-moe-Q4_K_M (~140 MB) on the 16 GB variant; skip on 12 GB.

Workstation cards (W7800/W7900, 32–48 GB VRAM)

Closer to Strix Halo territory. Q4 70B fits cleanly, with room for embed. Concurrent slots become realistic.

Installation notes

The standard one-liner from the install page handles everything for the Vulkan path:

curl -fsSL https://hal0.dev/install.sh | bash

You’ll want:

A recent Mesa with RADV Vulkan installed.
The amdgpu kernel module loaded (dmesg | grep amdgpu).
The service user in the render group for /dev/dri/* access.

The hardware probe detects the GPU’s VRAM correctly and sizes slot fit warnings to it.

Deployment shapes

Bare-metal Linux. No surprises. Install, the probe finds the card, you’re done.
Privileged LXC on Proxmox. Workable for both Vulkan and ROCm. Pass through /dev/dri/* and (for ROCm) /dev/kfd with matching cgroup allow entries, add the hal0 service user to render and video inside the container. ROCm in particular wants /dev/kfd present or it falls back to CPU.
VM with PCIe passthrough. Works. Make sure the card’s IOMMU group isn’t dragging the host’s other PCIe devices along, and pin vCPUs to physical cores on the same NUMA node as the card if you care about consistent throughput.

Troubleshooting

No GPU detected by probe. Run vulkaninfo --summary to confirm the Vulkan runtime sees the card. If that’s empty, fix the host Vulkan install before troubleshooting hal0.

Slot won’t start, journal mentions permission denied on /dev/dri. Add the service user to the render group:

sudo usermod -aG render hal0
sudo systemctl restart hal0-api

Throughput much lower than expected. Check that the card is not power-limited (cat /sys/class/drm/card0/device/power_dpm_force_performance_level), that the toolbox is the Vulkan build (hal0 slot list --json), and that you’re not running with a larger context window than the model documentation suggests.