Image generation

hal0 exposes OpenAI-compatible image generation at POST /v1/images/generations, served by a ComfyUI provider running inside the img slot. hal0 owns the OpenAI ↔ ComfyUI translation; the upstream is treated as a black box that speaks POST /prompt, GET /history/<id>, and GET /view.

The route is implemented in src/hal0/api/routes/v1.py and the provider in src/hal0/providers/comfyui.py. Workflow templates live under src/hal0/providers/workflows/.

The img slot sits on the same lifecycle as the chat and embed slots, so it shows up in the dashboard’s slot grid with the same states, controls, and metrics.

hal0 /slots view showing the embed, voice, and img capability cards with model pickers.

Overview

Endpoint: POST /v1/images/generations.
Slot: img. Part of BUILTIN_SLOTS in src/hal0/slots/manager.py, same lifecycle as primary, embed, stt, tts.
Backend: ComfyUI inside the ghcr.io/hal0ai/hal0-toolbox-comfyui:v1 toolbox image (pinned by sha256 in hal0/manifest.json).
Hardware: ROCm-capable AMD GPU. Strix Halo’s iGPU is the reference target. The 128 GB unified pool keeps an SDXL Turbo checkpoint and a primary chat model warm at the same time.

Curated models

The picker UI surfaces three curated entries spanning the licensing spectrum (see src/hal0/registry/curated.py):

Id	Family	On-disk	Min VRAM	License
`sdxl-turbo`	SDXL distilled	~6.5 GB	8 GB	SAI Non-Commercial Research Community
`sd-1.5-pruned-emaonly`	SD 1.5	~4.3 GB	4 GB	CreativeML Open RAIL-M
`flux-schnell`	FLUX.1 [schnell]	~23.8 GB	24 GB	Apache-2.0

curl: image generation

curl http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sdxl-turbo",
    "prompt": "a cat in a hat, studio lighting",
    "size": "1024x1024",
    "response_format": "url"
  }'

Honoured body fields (subset of OpenAI):

model (required). Curated id, e.g. sdxl-turbo.
prompt (required).
n. Batch size; default 1.
size. WxH string, e.g. 1024x1024.
response_format. "url" (default) or "b64_json".

hal0 extensions via extra_body: seed, steps, cfg, negative_prompt.

Response shape

{
  "created": 1716000000,
  "data": [
    { "url": "/api/images/cache/<uuid>.png" }
  ],
  "_hal0": {
    "workflow": "sdxl_turbo_simple",
    "checkpoint": "sd_xl_turbo_1.0_fp16.safetensors"
  }
}

When response_format is b64_json, each data[] entry carries b64_json (base64-encoded PNG) instead of url. The _hal0 debug field carries the workflow translator’s metadata so a misrouted prompt is easy to diagnose.

Hardware requirements

The v1 first-class target is a ROCm-capable AMD GPU, specifically the Strix Halo Ryzen AI Max+ 395 iGPU. SDXL Turbo runs comfortably alongside a small or mid chat model on a 128 GB unified pool; on a discrete AMD GPU you’ll want at least 8 GB of dedicated VRAM for SDXL Turbo, 4 GB for SD 1.5.

NVIDIA discrete GPUs are not yet wired. The ComfyUI provider defaults to the rocm backend. See the hardware overview for the full matrix and follow the upstream roadmap for CUDA support.

Slot configuration

A minimal slots/img.toml is shaped like every other slot. The default backend is rocm:

enabled = true
backend = "rocm"

[model]
default = "sdxl-turbo"

After a model lands in the curated catalogue’s per-id directory under /var/lib/hal0/comfyui/models/checkpoints/, start the slot:

hal0 slot load img --model sdxl-turbo

The OpenAI-shaped /v1/images/generations request will route there automatically; the dispatcher’s heuristics already pin /v1/images/* to the img slot.

Limitations

First-pull is heavy. The ComfyUI toolbox image is the largest one hal0 ships. The CI build takes ~19 minutes and the layer set is sizeable. Expect a long first docker pull on a fresh box; subsequent restarts hit the local layer cache.
No perf claims yet. No verified seconds-per-image numbers are in the repo for ComfyUI on Strix Halo iGPU. Treat the curated models’ VRAM hints as the only published sizing data until a real measurement lands.
Flux workflow. As above: flux-schnell is catalogued but the default workflow can’t drive it. A Flux workflow is the gating item before Flux is fully picker-grade.
License spread. The three curated entries each have a different license. The picker UI surfaces the badge so you pick consciously; check the bundled license_url field before shipping output anywhere production-facing.