Skip to content

Hugging Face pulls

hal0 pulls model weights through POST /api/models/{id}/pull. What’s wired up today:

  • FLM tags (the Ollama-style family:size ids served by the FLM toolbox) pull through the same route. is_flm_tag() detects them and shells flm pull <tag> inside the toolbox image; NPU-capable registry rows propagate pullable=True so the dashboard exposes the pull control.
  • Arbitrary Hugging Face repo refs still return 501 from the dashboard’s streaming pull path. The plumbing exists (SHA-256 atomic install, polled progress), but the HF-side fetch for arbitrary repos isn’t enabled yet. Pre-stage weights manually into /mnt/ai-models/local/ and register the entry in /var/lib/hal0/registry/registry.toml in the meantime; see Model registry.

The pull surfaces as a slot-level state transition (offline → pulling) when triggered by a swap, and the dashboard / CLI poll GET /api/models/{id}/pull/status for byte-level progress.

Three ways once the FLM toolbox image and host cache are present:

  • Dashboard. The Models view shows FLM rows (e.g. qwen3:4b) with a Pull button.
  • CLI.
    Terminal window
    hal0 model pull qwen3:4b
  • Slot swap.
    Terminal window
    hal0 slot swap primary --model qwen3:4b
    If the FLM cache doesn’t have it, the slot transitions through pulling before warming.

The FLM toolbox container bind-mounts the host FLM cache to /var/lib/hal0/.config/flm/models (the HOME of the non-root hal0 user inside the image). Probe and pull both depend on that mount; without it flm list -j returns empty and flm pull writes to a throwaway layer.

FLM has its own model tag namespace (model_list.json) — you can’t run arbitrary GGUFs through it. Per-(backend, model) validation runs at registry probe time.

The model root is /mnt/ai-models (rw ZFS on the hal0 LXC). Pulled weights go to /mnt/ai-models/local/<file>; FLM tags land in the FLM cache directory mounted from the host. The on-disk index is /var/lib/hal0/registry/registry.toml, which survives hal0 update (only /usr/lib/hal0/current/ gets swapped). Successful pulls are verified by SHA-256 and written atomically; a failed pull leaves no partial registry entry.

The dashboard and CLI poll GET /api/models/{id}/pull/status for:

  • Total bytes
  • Bytes received
  • Throughput (bytes / second)
  • Elapsed time
  • ETA

POST .../pull/cancel aborts an in-flight job. The slot stays in pulling until the file is fully verified; only then does it transition to warming.

SourcePull routeState
FLM tag (family:size)POST /api/models/{id}/pullflm pull in toolboxLive
Arbitrary Hugging Face repo refPOST /api/models/{id}/pull (HF path)Returns 501
Pre-staged file in /mnt/ai-models/local/direct registry editLive (manual)
  • Streaming HF pulls for arbitrary repo refs (the 501 path).
  • Repo authentication for gated models (HF_TOKEN plumbing).
  • Multi-file pulls (sharded GGUFs).
  • Resume on interrupt.
  • Disk-space pre-flight warning.
  • Mirror configuration for self-hosted HF caches.