Audio (STT & TTS)
hal0 ships two audio endpoints: speech-to-text on the stt slot, and
text-to-speech on the tts slot. Both speak the OpenAI Audio shape so
any client that hits OpenAI’s audio API works here.
Speech-to-text: Moonshine
Section titled “Speech-to-text: Moonshine”The stt slot defaults to Moonshine, a small, fast ASR model
built for edge real-time. The toolbox image is
hal0-toolbox-moonshine.
curl http://localhost:8080/v1/audio/transcriptions \ -H "Content-Type: multipart/form-data" \ -F file=@hello.wav \ -F model=sttResponse (OpenAI-shape):
{ "text": "Hello, world."}Alternates
Section titled “Alternates”The stt slot can host any ASR-compatible model the Moonshine
provider supports. For higher accuracy, whisper-large-v3-turbo
(~1.6 GB) if you have the headroom, or Canary-Qwen-2.5B (Open ASR
Leaderboard leader, 5.63% WER) for SOTA accuracy. Swap with:
hal0 slot swap stt --model whisper-large-v3-turboSee Recommended loadouts → Voice mode for the picks per tier.
Text-to-speech: Kokoro
Section titled “Text-to-speech: Kokoro”The tts slot defaults to Kokoro-82M v1.0, a small open TTS
model with 54 voices across 8 languages. The toolbox image is
hal0-toolbox-kokoro.
curl http://localhost:8080/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "model": "tts", "input": "Hello from hal0.", "voice": "af_bella" }' --output speech.wavAlternates
Section titled “Alternates”For voice cloning, the Kokoro provider also supports F5-TTS. Swap
with:
hal0 slot swap tts --model f5-ttsStatus today
Section titled “Status today”Moonshine and Kokoro are first-class providers as of v0.1.0-alpha.
Both have working code paths, slot lifecycle integration, and published
toolbox container images on ghcr.io/hal0ai/. The stt and tts
slots are configurable from the dashboard and start cleanly.
Coming soon
Section titled “Coming soon”- Real-time streaming TTS (chunked PCM output).
- Speaker diarization for transcription.
- Voice cloning UX in the dashboard.
- WebSocket transport for full duplex voice mode.