Skip to content

AMD discrete GPU

hal0 supports AMD discrete GPUs through two paths: Vulkan (the default) and ROCm (opt-in via the rocm toolbox). Both toolbox images live at ghcr.io/hal0ai/hal0-toolbox-vulkan and ghcr.io/hal0ai/hal0-toolbox-rocm with pinned digests.

AMD discrete is a supported target, not the reference platform. The Strix Halo page covers the iGPU and unified-memory story; this page is for desktop and workstation cards, where you trade unified memory for raw VRAM bandwidth.

  • Radeon RX 7900 XTX (24 GB GDDR6)
  • Radeon RX 7900 XT (20 GB GDDR6)

Both run Q4 7B–14B class models comfortably with room for a small embed slot. Q4 30B-A3B MoE works with shorter context. Q4 70B needs partial CPU offload.

  • Radeon RX 7800 XT (16 GB GDDR6)
  • Radeon RX 7700 XT (12 GB GDDR6)
  • Radeon PRO W7800 / W7900 (32 / 48 GB GDDR6, workstation parts)

The 16 GB and below cards run one slot at a time as a rule: chat-only with a Q4 7B–13B, or a smaller chat model alongside an embed.

The Vulkan toolbox is the same one Strix Halo uses. No ROCm headers required, no kernel modules beyond amdgpu, no version-pinning hell. The hardware probe picks Vulkan automatically when it sees an AMD discrete GPU.

The trade is throughput. Vulkan llama.cpp on AMD discrete is solid but typically lags ROCm builds by a meaningful margin on chat throughput. For a daily-driver chat box it’s fine; for raw benchmarks you’ll want ROCm.

The rocm toolbox is published. Opting in is one slot config change:

Terminal window
hal0 slot swap primary --provider llama-cpp-rocm

The slot lifecycle handles the toolbox swap atomically: unloading → starting → warming → ready. ROCm needs /dev/kfd present in the container; without it the provider falls back to CPU.

These are repeated from the Strix Halo loadouts page with the discrete-card framing.

  • primary: Qwen3-30B-A3B-Instruct-2507-Q4_K_M (~18.6 GB) fits with shorter context, or gemma-3-12b-it-Q4_K_M (~6.6 GB) for a longer window.
  • embed: small Q4 embed only (nomic-embed-text-v2-moe ~140 MB).
  • Q4 70B requires partial CPU offload. Works, but drops well below VRAM-resident speeds.
  • primary: Hermes-4-14B-Q4_K_M (~9 GB) or gemma-3-12b-it-Q4_K_M (~6.6 GB) with several GB for context.
  • embed: nomic-embed-text-v2-moe-Q4_K_M (~140 MB) on the 16 GB variant; skip on 12 GB.

Workstation cards (W7800/W7900, 32–48 GB VRAM)

Section titled “Workstation cards (W7800/W7900, 32–48 GB VRAM)”
  • Closer to Strix Halo territory. Q4 70B fits cleanly, with room for embed. Concurrent slots become realistic.

The standard one-liner from the install page handles everything for the Vulkan path:

Terminal window
curl -fsSL https://hal0.dev/install.sh | bash

You’ll want:

  • A recent Mesa with RADV Vulkan installed.
  • The amdgpu kernel module loaded (dmesg | grep amdgpu).
  • The service user in the render group for /dev/dri/* access.

The hardware probe detects the GPU’s VRAM correctly and sizes slot fit warnings to it.

  • Bare-metal Linux. No surprises. Install, the probe finds the card, you’re done.
  • Privileged LXC on Proxmox. Workable for both Vulkan and ROCm. Pass through /dev/dri/* and (for ROCm) /dev/kfd with matching cgroup allow entries, add the hal0 service user to render and video inside the container. ROCm in particular wants /dev/kfd present or it falls back to CPU.
  • VM with PCIe passthrough. Works. Make sure the card’s IOMMU group isn’t dragging the host’s other PCIe devices along, and pin vCPUs to physical cores on the same NUMA node as the card if you care about consistent throughput.

No GPU detected by probe. Run vulkaninfo --summary to confirm the Vulkan runtime sees the card. If that’s empty, fix the host Vulkan install before troubleshooting hal0.

Slot won’t start, journal mentions permission denied on /dev/dri. Add the service user to the render group:

Terminal window
sudo usermod -aG render hal0
sudo systemctl restart hal0-api

Throughput much lower than expected. Check that the card is not power-limited (cat /sys/class/drm/card0/device/power_dpm_force_performance_level), that the toolbox is the Vulkan build (hal0 slot list --json), and that you’re not running with a larger context window than the model documentation suggests.