Open-source audio intelligence.
Documentation · HuggingFace (Apple · ONNX & LiteRT) · Blog
📖 English · 中文 · 日本語 · 한국어 · Español · Deutsch · Français · हिन्दी · Português · Русский · العربية · Tiếng Việt · Türkçe · ไทย
speech-swift — AI speech models for Apple Silicon. ASR, TTS, speech-to-speech, VAD, diarization, and speech enhancement — all running locally via MLX and CoreML. No cloud, no API keys.
speech-android — On-device speech SDK for Android. ASR, TTS, VAD, and noise cancellation powered by ONNX Runtime with Qualcomm NNAPI acceleration.
speech-core — On-device VAD, streaming STT, TTS, and diarization in C++17 (ONNX + LiteRT) with a voice-agent pipeline state machine. Linux, Windows, Android.
speech-studio — Open-source desktop voice-cloning studio for creators. Tauri + Qwen3-TTS on Apple Silicon.
soniqo.audio covers setup, usage, and architecture for all SDKs:
- Getting Started — Installation via Homebrew, SPM, and Gradle
- Guides — Per-model walkthroughs: Qwen3-ASR, Parakeet TDT, Qwen3-TTS, CosyVoice, Kokoro, PersonaPlex, VAD, diarization, denoising, and more
- CLI Reference — All commands and flags
- API & Protocols — Shared Swift protocols and types
- Architecture — Module structure, backends, weight formats, and memory tables
- Benchmarks — RTF, latency, WER, and memory across devices
Join our Discord → — questions, support, model requests, and updates.
Integrating on-device speech into your app, need support, or want your model to be supported?