Hugging Face pulls
hal0 pulls model weights through POST /api/models/{id}/pull. What’s
wired up today:
- FLM tags (the Ollama-style
family:sizeids served by the FLM toolbox) pull through the same route.is_flm_tag()detects them and shellsflm pull <tag>inside the toolbox image; NPU-capable registry rows propagatepullable=Trueso the dashboard exposes the pull control. - Arbitrary Hugging Face repo refs still return 501 from the
dashboard’s streaming pull path. The plumbing exists (SHA-256
atomic install, polled progress), but the HF-side fetch for
arbitrary repos isn’t enabled yet. Pre-stage weights manually
into
/mnt/ai-models/local/and register the entry in/var/lib/hal0/registry/registry.tomlin the meantime; see Model registry.
The pull surfaces as a slot-level state transition
(offline → pulling) when triggered by a swap, and the dashboard /
CLI poll GET /api/models/{id}/pull/status for byte-level progress.
Pulling an FLM tag
Section titled “Pulling an FLM tag”Three ways once the FLM toolbox image and host cache are present:
- Dashboard. The Models view shows FLM rows (e.g.
qwen3:4b) with a Pull button. - CLI.
Terminal window hal0 model pull qwen3:4b - Slot swap.
If the FLM cache doesn’t have it, the slot transitions through
Terminal window hal0 slot swap primary --model qwen3:4bpullingbeforewarming.
The FLM toolbox container bind-mounts the host FLM cache to
/var/lib/hal0/.config/flm/models (the HOME of the non-root hal0
user inside the image). Probe and pull both depend on that mount;
without it flm list -j returns empty and flm pull writes to a
throwaway layer.
FLM has its own model tag namespace (model_list.json) — you can’t
run arbitrary GGUFs through it. Per-(backend, model) validation
runs at registry probe time.
Where the bytes land
Section titled “Where the bytes land”The model root is /mnt/ai-models (rw ZFS on the hal0 LXC). Pulled
weights go to /mnt/ai-models/local/<file>; FLM tags land in the
FLM cache directory mounted from the host. The on-disk index is
/var/lib/hal0/registry/registry.toml, which survives hal0 update
(only /usr/lib/hal0/current/ gets swapped). Successful pulls are
verified by SHA-256 and written atomically; a failed pull leaves no
partial registry entry.
Progress polling
Section titled “Progress polling”The dashboard and CLI poll GET /api/models/{id}/pull/status for:
- Total bytes
- Bytes received
- Throughput (bytes / second)
- Elapsed time
- ETA
POST .../pull/cancel aborts an in-flight job. The slot stays in
pulling until the file is fully verified; only then does it
transition to warming.
Status today
Section titled “Status today”| Source | Pull route | State |
|---|---|---|
FLM tag (family:size) | POST /api/models/{id}/pull → flm pull in toolbox | Live |
| Arbitrary Hugging Face repo ref | POST /api/models/{id}/pull (HF path) | Returns 501 |
Pre-staged file in /mnt/ai-models/local/ | direct registry edit | Live (manual) |
Coming soon — outline
Section titled “Coming soon — outline”- Streaming HF pulls for arbitrary repo refs (the 501 path).
- Repo authentication for gated models (
HF_TOKENplumbing). - Multi-file pulls (sharded GGUFs).
- Resume on interrupt.
- Disk-space pre-flight warning.
- Mirror configuration for self-hosted HF caches.