COOLJAPAN Blog

The GPU computing foundation of the COOLJAPAN ecosystem just became fully sovereign.

Today we released OxiCUDA 0.1.2 — a complete, production-grade pure Rust replacement for the entire NVIDIA CUDA Toolkit.

No CUDA SDK. No nvcc. No C/C++ toolchain. No build-time dependencies.
Only runtime dependency is the official NVIDIA driver (libcuda.so / nvcuda.dll).
Just clean, type-safe, memory-safe GPU code that compiles to a single static binary (or WASM) and runs on Turing through Blackwell GPUs — and across multiple vendor backends.

Why OxiCUDA 0.1.2 is a game changer

For decades, high-performance GPU computing meant depending on the massive NVIDIA CUDA Toolkit (with its complex SDK, C++ headers, and vendor lock-in).

These tools are powerful but suffer from:

C/C++ memory unsafety and segfault risks
Heavy build-time dependencies and nvcc requirements
Vendor lock-in (NVIDIA-only)
Poor portability to Metal, Vulkan, WebGPU, ROCm, or embedded
Difficulty integrating with safe Rust scientific/AI stacks

OxiCUDA 0.1.2 ends all of that.

It delivers ≥90–95% of native CUDA performance while being 100% memory-safe and auditable.
Notable results (target on supported architectures):

SGEMM (FP32): ≥95% of cuBLAS
HGEMM (FP16): ≥95% of cuBLAS (Tensor-Core)
FlashAttention: ≥90% of FA2
FFT (power-of-2): ≥90% of cuFFT
Convolution (FP16): ≥90% of cuDNN

Technical Deep Dive: How We Rebuilt the CUDA Toolkit in Pure Rust

OxiCUDA is organized into 10 volumes + 7 backends (28 crates total), mirroring the CUDA ecosystem while being radically simplified and optimized for Rust:

Foundation (4 crates)
Dynamic driver loading, type-safe DeviceBuffer<T>, RAII memory management, launch! macro.
PTX Codegen + Autotuner (2 crates)
Pure-Rust PTX IR DSL (SM 7.5–10.0), Tensor-Core WMMA/MMA/WGMMA, 3-tier runtime autotuner with disk cache.
Linear Algebra (1 crate)
Full cuBLAS equivalent (GEMM, batched, reductions, elementwise).
Deep Learning (1 crate)
cuDNN replacement: Conv (Winograd/direct/fused), FlashAttention, Norm layers, FP8/INT8/INT4 quant.
Scientific Computing (4 crates)
cuFFT, cuSPARSE, cuSOLVER, cuRAND (Stockham FFT, SpMV/SpGEMM, LU/QR/SVD/Cholesky, Philox RNG).

6–10. Signal Processing, Computation Graph, GPU Training, Inference Engine, Reinforcement Learning
MFCC/STFT, CUDA Graphs, fused optimizers (Adam/AdamW), paged KV-cache, PPO/DQN/SAC, etc.

Backends (7 crates)
Native NVIDIA (via libcuda), Metal, Vulkan, WebGPU, ROCm, LevelZero + generic trait.

Key Rust advantages:

100% Pure Rust (zero C/C++/CUDA SDK at build time)
Type-safe kernels and memory management
Built-in autotuner selects optimal kernel per GPU
Full integration with SciRS2, OxiBLAS, OxiONNX, OxiLLaMa

What’s inside 0.1.2 (released April 15)

Stability improvements and CI hardening
Expanded backends and autotuner refinements
Full production readiness of Volumes 1–10
Production readiness confirmed through 7,263 passing tests across 28 crates
253,125 lines of pure Rust — zero warnings, Clippy0 + fail0 enforced

This is the foundation

OxiCUDA is now the official GPU computing backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):

SciRS2 / NumRS2 — all GPU-accelerated scientific computing
OxiBLAS / OxiFFT — linear algebra and signal processing
OptiRS / OxiLLaMa — training and LLM inference
OxiMedia / VoiRS — real-time media and speech pipelines
OxiGDAL / Spintronics / OxiHuman — geospatial, physics, and biomechanical GPU simulations
ToRSh / OxiRAG — high-throughput tensor and RAG operations

Repository: https://github.com/cool-japan/oxicuda

Star the repo if you want GPU computing without the CUDA Toolkit or C++ toolchain.

The era of “just install the CUDA SDK” is over.

Pure Rust GPU computing is here — fast, safe, portable, and sovereign.

— KitaSan at COOLJAPAN OÜ April 15, 2026

OxiCUDA 0.1.2 Released — Pure Rust CUDA Toolkit Replacement

Why OxiCUDA 0.1.2 is a game changer

Technical Deep Dive: How We Rebuilt the CUDA Toolkit in Pure Rust

What’s inside 0.1.2 (released April 15)

This is the foundation