OxiONNX 0.1.0 Released — Pure Rust ONNX Inference Engine with 147 Operators

The ONNX inference foundation of the COOLJAPAN ecosystem just went fully sovereign and production-ready.

Today we released OxiONNX 0.1.0 — a complete, high-performance ONNX inference engine written entirely in pure Rust.

No C. No C++. No ONNX Runtime binaries. No external protobuf or CUDA dependencies.
No unsafe code in hot paths. No build hell.
Just clean, memory-safe, blazing-fast ONNX model execution that compiles to a single static binary (or <1 MB WASM) and runs everywhere — from laptops to browsers to edge devices to cloud GPUs.

Why OxiONNX 0.1.0 is a game changer

For years, ONNX inference meant depending on the heavy C++/CUDA-based ONNX Runtime or vendor-specific runtimes.

These tools are powerful but suffer from:

C/C++ memory unsafety and segfault risks
Complex system dependencies and large binaries
Poor WASM/embedded/no_std support
Vendor lock-in and limited portability
Graph optimization and encryption limitations

OxiONNX 0.1.0 ends all of that.

It delivers high-performance inference while being 100% memory-safe and fully auditable.
Notable results:

Full support for 147 ONNX operators (Math, NN, Conv, Attention, RNN, Quantization, Control Flow, etc.)
wgpu GPU acceleration for MatMul, Softmax, ReLU and more
SIMD (AVX2/NEON) for element-wise ops
Streaming inference for autoregressive models

Technical Deep Dive: How We Built a Production-Grade ONNX Runtime in Pure Rust

The architecture uses a clean multi-crate design, radically optimized for modern Rust:

Core Layer (oxionnx-core)
Tensor, DType, Graph, Operator trait, arena allocator, buffer pooling, strided views.
Operators (oxionnx-ops)
147 fully implemented operators with automatic type promotion and mixed precision (f16/f32).
Proto & Graph (oxionnx-proto)
Pure-Rust ONNX protobuf parser + graph optimizer (constant folding, operator fusion, CSE, dead-code elimination).
GPU Backend (oxionnx-gpu)
wgpu compute shaders (optional, zero-overhead fallback to CPU).

Key Rust advantages:

Zero C/C++/Fortran dependencies — fully self-contained
SIMD + GPU acceleration with zero-cost abstractions
Full no_std + alloc support
Model encryption (AES-GCM) and session caching
Async execution + streaming token-by-token inference
WASM via wasm-bindgen for browser-native AI

What’s inside 0.1.0 (released March 26)

Initial production release with full 147-operator coverage
Graph optimizer, async execution, and model encryption stabilized
wgpu GPU backend and SIMD acceleration
WASM + no_std core ready
Production readiness confirmed through 590+ passing tests
30,000+ lines of pure Rust across 5 crates — zero Clippy warnings, fail0 enforced

This is the foundation

OxiONNX is now the official ONNX inference backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):

SciRS2 / NumRS2 — all tensor operations and AI pipelines
OptiRS — training + inference loop integration
OxiMedia — vision model deployment
VoiRS — multimodal speech+vision models
OxiHuman / Spintronics — physics-informed and avatar AI inference
ToRSh / OxiRAG — high-throughput RAG and agent inference
Future integration with OxiLean for formally verified ONNX graphs

Repository: https://github.com/cool-japan/oxionnx

Star the repo if you want fast, safe, sovereign ONNX inference without C/C++ or vendor lock-in.

The era of “just use onnxruntime” with all its native dependencies is over.

Pure Rust ONNX inference is here — fast, portable, memory-safe, and sovereign.

— KitaSan at COOLJAPAN OÜ March 26, 2026