Production-grade pure Rust Text-to-Speech (TTS), Voice Recognition, and Sound framework. VITS + HiFi-GAN/DiffWave vocoders, real-time ≤0.05× RTF on GPU, streaming synthesis, SSML, 20+ languages, ONNX/Kokoro-82M support, SafeTensors checkpoints. Full integration with SciRS2/NumRS2. WASM, GPU (CUDA/Metal), Python/FFI bindings. The sovereign speech AI layer for the entire COOLJAPAN ecosystem (now 21M+ SLoC total).
The speech synthesis and voice AI foundation of the COOLJAPAN ecosystem just reached its first Release Candidate.
Today we released VoiRS 0.1.0 Release Candidate 1 — a complete, production-grade pure Rust framework for neural Text-to-Speech (TTS), Voice Recognition, and high-performance Sound processing.
No Python. No C++. No FFmpeg. No external model runtimes.
No unsafe code in hot paths. No dependency hell.
Just clean, memory-safe, real-time neural speech that compiles to a single static binary (or WASM) and runs everywhere — from laptops to browsers to edge devices to cloud GPUs.
For years, state-of-the-art speech synthesis and voice AI meant depending on heavy Python stacks (Coqui TTS, Tortoise, Piper) or proprietary cloud services.
These tools are powerful but suffer from:
VoiRS 0.1.0 RC1 ends all of that.
It delivers real-time performance while being 100% memory-safe and fully portable.
Notable results:
The architecture unifies high-performance crates from the COOLJAPAN ecosystem into a clean, end-to-end pipeline:
Core Pipeline
Text → G2P (pluggable: Phonetisaurus, OpenJTalk, Neural) → Acoustic Model (VITS / FastSpeech2) → Vocoder (HiFi-GAN + DiffWave) → Audio (WAV/OGG).
Neural Models (0.1.0 RC1 highlights)
Advanced Features
Hardware & Interop
Key Rust advantages:
voirs-cli) for synthesis, training, and voice managementVoiRS is now the official speech synthesis and voice AI backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):
Repository: https://github.com/cool-japan/voirs
Star the repo if you want real-time, memory-safe, sovereign neural speech synthesis without Python or cloud dependencies.
The era of “just pip install TTS” with all its overhead is over.
Pure Rust neural TTS, voice recognition, and sound processing is here — fast, safe, multilingual, and sovereign.
— KitaSan at COOLJAPAN OÜ March 26, 2026