Skip to content

Streaming

hal0 streams /v1/chat/completions and /v1/completions responses as Server-Sent Events, exactly matching the OpenAI streaming protocol. Any OpenAI SDK that handles streaming today works against hal0 unmodified.

Add "stream": true to the request body:

Terminal window
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "primary",
"stream": true,
"messages": [
{"role": "user", "content": "Count to five."}
]
}'

Each chunk is a data: … line, JSON-encoded, terminated by a blank line. The stream ends with data: [DONE]. Same shape OpenAI ships.

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"One"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":", two"}}]}
data: [DONE]
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="local")
stream = client.chat.completions.create(
model="primary",
stream=True,
messages=[{"role": "user", "content": "Count to five."}],
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)

Streaming flows through the same dispatcher that handles non-streaming requests, so you get:

  • Single-flight prefetch. If two clients open identical streams on a cold slot, the slot fires one upstream call and fans the token stream to both.
  • Adaptive cold-boot. The first request after a slot reaches ready keeps the connection open while the model finishes warming; you don’t get a 503 on a request that’s about to work.
  • Structured errors mid-stream. If the slot transitions to error part-way through, the stream emits one final SSE event with the structured error envelope before closing.

The same SSE wire format works across the LAN. Point an OpenWebUI on your laptop or an MCP server on another box at http://hal0.lan:8080/v1 and streaming behaves identically through your Traefik vhost.