Skip to content

Image generation

hal0 exposes OpenAI-compatible image generation at POST /v1/images/generations, served by a ComfyUI provider running inside the img slot. hal0 owns the OpenAI ↔ ComfyUI translation; the upstream is treated as a black box that speaks POST /prompt, GET /history/<id>, and GET /view.

The route is implemented in src/hal0/api/routes/v1.py and the provider in src/hal0/providers/comfyui.py. Workflow templates live under src/hal0/providers/workflows/.

The img slot sits on the same lifecycle as the chat and embed slots, so it shows up in the dashboard’s slot grid with the same states, controls, and metrics.

hal0 /slots view showing the embed, voice, and img capability cards with model pickers.
  • Endpoint: POST /v1/images/generations.
  • Slot: img. Part of BUILTIN_SLOTS in src/hal0/slots/manager.py, same lifecycle as primary, embed, stt, tts.
  • Backend: ComfyUI inside the ghcr.io/hal0ai/hal0-toolbox-comfyui:v1 toolbox image (pinned by sha256 in hal0/manifest.json).
  • Hardware: ROCm-capable AMD GPU. Strix Halo’s iGPU is the reference target. The 128 GB unified pool keeps an SDXL Turbo checkpoint and a primary chat model warm at the same time.

The picker UI surfaces three curated entries spanning the licensing spectrum (see src/hal0/registry/curated.py):

IdFamilyOn-diskMin VRAMLicense
sdxl-turboSDXL distilled~6.5 GB8 GBSAI Non-Commercial Research Community
sd-1.5-pruned-emaonlySD 1.5~4.3 GB4 GBCreativeML Open RAIL-M
flux-schnellFLUX.1 [schnell]~23.8 GB24 GBApache-2.0
Terminal window
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "sdxl-turbo",
"prompt": "a cat in a hat, studio lighting",
"size": "1024x1024",
"response_format": "url"
}'

Honoured body fields (subset of OpenAI):

  • model (required). Curated id, e.g. sdxl-turbo.
  • prompt (required).
  • n. Batch size; default 1.
  • size. WxH string, e.g. 1024x1024.
  • response_format. "url" (default) or "b64_json".

hal0 extensions via extra_body: seed, steps, cfg, negative_prompt.

{
"created": 1716000000,
"data": [
{ "url": "/api/images/cache/<uuid>.png" }
],
"_hal0": {
"workflow": "sdxl_turbo_simple",
"checkpoint": "sd_xl_turbo_1.0_fp16.safetensors"
}
}

When response_format is b64_json, each data[] entry carries b64_json (base64-encoded PNG) instead of url. The _hal0 debug field carries the workflow translator’s metadata so a misrouted prompt is easy to diagnose.

The v1 first-class target is a ROCm-capable AMD GPU, specifically the Strix Halo Ryzen AI Max+ 395 iGPU. SDXL Turbo runs comfortably alongside a small or mid chat model on a 128 GB unified pool; on a discrete AMD GPU you’ll want at least 8 GB of dedicated VRAM for SDXL Turbo, 4 GB for SD 1.5.

NVIDIA discrete GPUs are not yet wired. The ComfyUI provider defaults to the rocm backend. See the hardware overview for the full matrix and follow the upstream roadmap for CUDA support.

A minimal slots/img.toml is shaped like every other slot. The default backend is rocm:

/etc/hal0/slots/img.toml
enabled = true
backend = "rocm"
[model]
default = "sdxl-turbo"

After a model lands in the curated catalogue’s per-id directory under /var/lib/hal0/comfyui/models/checkpoints/, start the slot:

Terminal window
hal0 slot load img --model sdxl-turbo

The OpenAI-shaped /v1/images/generations request will route there automatically; the dispatcher’s heuristics already pin /v1/images/* to the img slot.

  • First-pull is heavy. The ComfyUI toolbox image is the largest one hal0 ships. The CI build takes ~19 minutes and the layer set is sizeable. Expect a long first docker pull on a fresh box; subsequent restarts hit the local layer cache.
  • No perf claims yet. No verified seconds-per-image numbers are in the repo for ComfyUI on Strix Halo iGPU. Treat the curated models’ VRAM hints as the only published sizing data until a real measurement lands.
  • Flux workflow. As above: flux-schnell is catalogued but the default workflow can’t drive it. A Flux workflow is the gating item before Flux is fully picker-grade.
  • License spread. The three curated entries each have a different license. The picker UI surfaces the badge so you pick consciously; check the bundled license_url field before shipping output anywhere production-facing.