The speech synthesis and voice AI foundation of the COOLJAPAN ecosystem just reached its first Release Candidate.
Today we released VoiRS 0.1.0 Release Candidate 1 — a complete, production-grade pure Rust framework for neural Text-to-Speech (TTS), Voice Recognition, and high-performance Sound processing.
No Python. No C++. No FFmpeg. No external model runtimes.
No unsafe code in hot paths. No dependency hell.
Just clean, memory-safe, real-time neural speech that compiles to a single static binary (or WASM) and runs everywhere — from laptops to browsers to edge devices to cloud GPUs.
Why VoiRS 0.1.0 RC1 is a game changer
For years, state-of-the-art speech synthesis and voice AI meant depending on heavy Python stacks (Coqui TTS, Tortoise, Piper) or proprietary cloud services.
These tools are powerful but suffer from:
- Python interpreter overhead and slow inference
- Memory unsafety and complex C++/CUDA dependencies
- Vendor lock-in and latency issues
- Difficulty in offline, WASM, or embedded deployment
- Lack of full training pipelines in a single safe language
VoiRS 0.1.0 RC1 ends all of that.
It delivers real-time performance while being 100% memory-safe and fully portable.
Notable results:
- Real-time factor (RTF): ≤ 0.3× on consumer CPUs
- GPU (RTX 4080): ≤ 0.05× RTF (0.04× demonstrated)
- Streaming synthesis with low-latency chunked audio
Technical Deep Dive: How We Built a Production-Grade Neural Speech Stack in Pure Rust
The architecture unifies high-performance crates from the COOLJAPAN ecosystem into a clean, end-to-end pipeline:
-
Core Pipeline
Text → G2P (pluggable: Phonetisaurus, OpenJTalk, Neural) → Acoustic Model (VITS / FastSpeech2) → Vocoder (HiFi-GAN + DiffWave) → Audio (WAV/OGG). -
Neural Models (0.1.0 RC1 highlights)
- Full VITS + HiFi-GAN inference
- New DiffWave vocoder training pipeline with gradient updates, SafeTensors checkpoints (370 parameters, 30 MB), and real parameter saving
- ONNX Runtime integration for Kokoro-82M (9 languages, 54 voices) — zero Python required
-
Advanced Features
- Streaming synthesis (chunk-based low-latency)
- SSML support
- Multilingual (20+ languages; production-ready English/Japanese, beta Spanish/French/German/Mandarin)
- Automatic IPA generation via eSpeak NG backend
-
Hardware & Interop
- GPU acceleration (CUDA on Linux/Windows, Metal on macOS)
- WASM target for browser-native synthesis
- FFI bindings (C, PyO3 Python, NAPI Node.js, Unity/Unreal plugins)
Key Rust advantages:
- 100% Pure Rust core (SciRS2/NumRS2 for DSP and linear algebra)
- SIMD optimizations throughout
- SafeTensors for production model persistence
- No-unwrap policy + enforced Clippy0/fail0
- 7 crates with clean separation (voirs-g2p, voirs-acoustic, voirs-vocoder, voirs-dataset, voirs-cli, etc.)
What’s inside 0.1.0 RC1 (released March 26)
- Full DiffWave training pipeline with gradient-based learning and SafeTensors checkpoints
- Kokoro-82M ONNX multilingual TTS integration
- Streaming synthesis and SSML support stabilized
- WASM + GPU backends production-ready
- CLI tool (
voirs-cli) for synthesis, training, and voice management - Production readiness confirmed with comprehensive tests and benchmarks
This is the foundation
VoiRS is now the official speech synthesis and voice AI backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):
- SciRS2 / NumRS2 — all DSP, linear algebra, and neural operations
- OxiMedia — real-time audio/video pipelines and avatar voice sync
- OptiRS — training optimizers for custom voice models
- ToRSh / OxiRAG — conversational voice RAG and agent audio
- OxiHuman — biomechanical voice animation and lip-sync
- Future integration with OxiLean for formally verified TTS pipelines
Repository: https://github.com/cool-japan/voirs
Star the repo if you want real-time, memory-safe, sovereign neural speech synthesis without Python or cloud dependencies.
The era of “just pip install TTS” with all its overhead is over.
Pure Rust neural TTS, voice recognition, and sound processing is here — fast, safe, multilingual, and sovereign.
— KitaSan at COOLJAPAN OÜ March 26, 2026