Load your first model
A fresh hal0 install boots into the FirstRun wizard. Open the
dashboard at http://localhost:8080 and the wizard owns the screen
until the primary slot has a model and reaches ready.
What the wizard does
Section titled “What the wizard does”The wizard is a guarded route at /firstrun. Until the API reports
first_run: false every other dashboard navigation redirects back to
it. There’s no point operating an empty box.
It owns a linear sequence: confirm the password, confirm the detected
hardware and model storage, pick a primary chat model, pick which
capabilities to bring up (embed, voice, image), accept any
click-through licenses, then stream the pulls into their slots. When
primary reaches ready the wizard hands you the rest of the
dashboard.
The steps
Section titled “The steps”-
Password. Optional. Auto-skips when
/api/auth/status.password_setis already true. -
Hardware + storage. Confirms what the probe wrote to
/etc/hal0/hardware.jsonand which directories the registry will scan and pull into. -
Primary chat model. A curated list, hardware-aware. Fit warnings appear inline next to anything that would offload heavily. The default highlight is
Phi-3-mini-4k-instruct-q4— a 2.4 GB Q4 GGUF that pulls in roughly 10 seconds on a decent connection and fits anywhere hal0 runs. Strix Halo operators should jump straight to a Q4 7B-class chat model or a Q4 MoE 30B; see recommended loadouts. -
Capabilities. Embed, voice (STT + TTS), and image generation, each with smart defaults. Rerank is a sub-disclosure inside the embed row, locked off by default.
-
Hugging Face token. Conditional. Only renders when at least one selected model is gated.
-
License acceptance. Aggregated across every selected model. Skipped if nothing requires it.
-
Install. Parallel pulls plus capability registration, with per-row retry. Progress streams over SSE: bytes, percent, and the slot state walking through
pulling → starting → warming → ready. -
Done. Links to the dashboard, OpenWebUI, and settings.
When the pull fails
Section titled “When the pull fails”The wizard surfaces errors inline. Common ones:
- No disk space.
/var/lib/hal0/models/ran out mid-pull. Free the space and retry — partial downloads resume. - Hugging Face rate-limit. Anonymous pulls hit a rate cap on
popular weights. Export
HF_TOKEN(or set it in/etc/hal0/api.env) and retry. - License not accepted on Hugging Face. Some gated models require acceptance on the HF side before the API will serve the files. The error message links out to the model page.
The slot stays in error with details in
/var/lib/hal0/slots/primary/state.json until you retry. Nothing
hidden.
Picking something other than the default
Section titled “Picking something other than the default”The wizard’s curated list is a starting point. After it’s done the Models page in the dashboard is where you live:
Each row carries the model’s capability tags, on-disk size, and the backend that owns it. From here you can pull more weights, assign a model to a slot, or hand the same job to the CLI:
hal0 model listhal0 slot swap primary --model qwen2.5-coder-7b-instruct-q4_k_mSee recommended loadouts for a hardware-by-hardware breakdown of what fits where.