Hugging Face pulls

hal0 pulls model weights through POST /api/models/{id}/pull. What’s wired up today:

FLM tags (the Ollama-style family:size ids served by the FLM toolbox) pull through the same route. is_flm_tag() detects them and shells flm pull <tag> inside the toolbox image; NPU-capable registry rows propagate pullable=True so the dashboard exposes the pull control.
Arbitrary Hugging Face repo refs still return 501 from the dashboard’s streaming pull path. The plumbing exists (SHA-256 atomic install, polled progress), but the HF-side fetch for arbitrary repos isn’t enabled yet. Pre-stage weights manually into /mnt/ai-models/local/ and register the entry in /var/lib/hal0/registry/registry.toml in the meantime; see Model registry.

The pull surfaces as a slot-level state transition (offline → pulling) when triggered by a swap, and the dashboard / CLI poll GET /api/models/{id}/pull/status for byte-level progress.

Pulling an FLM tag

Three ways once the FLM toolbox image and host cache are present:

Dashboard. The Models view shows FLM rows (e.g. qwen3:4b) with a Pull button.
CLI.
Terminal window
```
hal0 model pull qwen3:4b
```
Slot swap.
Terminal window
```
hal0 slot swap primary --model qwen3:4b
```
If the FLM cache doesn’t have it, the slot transitions through pulling before warming.

The FLM toolbox container bind-mounts the host FLM cache to /var/lib/hal0/.config/flm/models (the HOME of the non-root hal0 user inside the image). Probe and pull both depend on that mount; without it flm list -j returns empty and flm pull writes to a throwaway layer.

FLM has its own model tag namespace (model_list.json) — you can’t run arbitrary GGUFs through it. Per-(backend, model) validation runs at registry probe time.

Where the bytes land

The model root is /mnt/ai-models (rw ZFS on the hal0 LXC). Pulled weights go to /mnt/ai-models/local/<file>; FLM tags land in the FLM cache directory mounted from the host. The on-disk index is /var/lib/hal0/registry/registry.toml, which survives hal0 update (only /usr/lib/hal0/current/ gets swapped). Successful pulls are verified by SHA-256 and written atomically; a failed pull leaves no partial registry entry.

Progress polling

The dashboard and CLI poll GET /api/models/{id}/pull/status for:

Total bytes
Bytes received
Throughput (bytes / second)
Elapsed time
ETA

POST .../pull/cancel aborts an in-flight job. The slot stays in pulling until the file is fully verified; only then does it transition to warming.

Status today

Source	Pull route	State
FLM tag (`family:size`)	`POST /api/models/{id}/pull` → `flm pull` in toolbox	Live
Arbitrary Hugging Face repo ref	`POST /api/models/{id}/pull` (HF path)	Returns 501
Pre-staged file in `/mnt/ai-models/local/`	direct registry edit	Live (manual)

Coming soon — outline

Streaming HF pulls for arbitrary repo refs (the 501 path).
Repo authentication for gated models (HF_TOKEN plumbing).
Multi-file pulls (sharded GGUFs).
Resume on interrupt.
Disk-space pre-flight warning.
Mirror configuration for self-hosted HF caches.