The ONNX inference foundation of the COOLJAPAN ecosystem just went fully sovereign and production-ready.
Today we released OxiONNX 0.1.0 — a complete, high-performance ONNX inference engine written entirely in pure Rust.
No C. No C++. No ONNX Runtime binaries. No external protobuf or CUDA dependencies.
No unsafe code in hot paths. No build hell.
Just clean, memory-safe, blazing-fast ONNX model execution that compiles to a single static binary (or <1 MB WASM) and runs everywhere — from laptops to browsers to edge devices to cloud GPUs.
Why OxiONNX 0.1.0 is a game changer
For years, ONNX inference meant depending on the heavy C++/CUDA-based ONNX Runtime or vendor-specific runtimes.
These tools are powerful but suffer from:
- C/C++ memory unsafety and segfault risks
- Complex system dependencies and large binaries
- Poor WASM/embedded/no_std support
- Vendor lock-in and limited portability
- Graph optimization and encryption limitations
OxiONNX 0.1.0 ends all of that.
It delivers high-performance inference while being 100% memory-safe and fully auditable.
Notable results:
- Full support for 147 ONNX operators (Math, NN, Conv, Attention, RNN, Quantization, Control Flow, etc.)
- wgpu GPU acceleration for MatMul, Softmax, ReLU and more
- SIMD (AVX2/NEON) for element-wise ops
- Streaming inference for autoregressive models
Technical Deep Dive: How We Built a Production-Grade ONNX Runtime in Pure Rust
The architecture uses a clean multi-crate design, radically optimized for modern Rust:
-
Core Layer (
oxionnx-core)
Tensor, DType, Graph, Operator trait, arena allocator, buffer pooling, strided views. -
Operators (
oxionnx-ops)
147 fully implemented operators with automatic type promotion and mixed precision (f16/f32). -
Proto & Graph (
oxionnx-proto)
Pure-Rust ONNX protobuf parser + graph optimizer (constant folding, operator fusion, CSE, dead-code elimination). -
GPU Backend (
oxionnx-gpu)
wgpu compute shaders (optional, zero-overhead fallback to CPU).
Key Rust advantages:
- Zero C/C++/Fortran dependencies — fully self-contained
- SIMD + GPU acceleration with zero-cost abstractions
- Full
no_std+allocsupport - Model encryption (AES-GCM) and session caching
- Async execution + streaming token-by-token inference
- WASM via
wasm-bindgenfor browser-native AI
What’s inside 0.1.0 (released March 26)
- Initial production release with full 147-operator coverage
- Graph optimizer, async execution, and model encryption stabilized
- wgpu GPU backend and SIMD acceleration
- WASM + no_std core ready
- Production readiness confirmed through 590+ passing tests
- 30,000+ lines of pure Rust across 5 crates — zero Clippy warnings, fail0 enforced
This is the foundation
OxiONNX is now the official ONNX inference backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):
- SciRS2 / NumRS2 — all tensor operations and AI pipelines
- OptiRS — training + inference loop integration
- OxiMedia — vision model deployment
- VoiRS — multimodal speech+vision models
- OxiHuman / Spintronics — physics-informed and avatar AI inference
- ToRSh / OxiRAG — high-throughput RAG and agent inference
- Future integration with OxiLean for formally verified ONNX graphs
Repository: https://github.com/cool-japan/oxionnx
Star the repo if you want fast, safe, sovereign ONNX inference without C/C++ or vendor lock-in.
The era of “just use onnxruntime” with all its native dependencies is over.
Pure Rust ONNX inference is here — fast, portable, memory-safe, and sovereign.
— KitaSan at COOLJAPAN OÜ March 26, 2026