Complete, type-safe, memory-safe rewrite of the entire NVIDIA CUDA Toolkit in pure Rust. cuBLAS/cuDNN/cuFFT/cuSPARSE/cuSOLVER/cuRAND and more — all in 253k SLoC across 28 crates. Only runtime dependency is the NVIDIA driver. PTX codegen + autotuner, 7 GPU backends (Metal/Vulkan/WebGPU/ROCm/LevelZero). ≥90–95% of native CUDA performance. The sovereign GPU computing layer for SciRS2 and the entire COOLJAPAN ecosystem (now 21M+ SLoC total).
The GPU computing foundation of the COOLJAPAN ecosystem just became fully sovereign.
Today we released OxiCUDA 0.1.2 — a complete, production-grade pure Rust replacement for the entire NVIDIA CUDA Toolkit.
No CUDA SDK. No nvcc. No C/C++ toolchain. No build-time dependencies.
Only runtime dependency is the official NVIDIA driver (libcuda.so / nvcuda.dll).
Just clean, type-safe, memory-safe GPU code that compiles to a single static binary (or WASM) and runs on Turing through Blackwell GPUs — and across multiple vendor backends.
For decades, high-performance GPU computing meant depending on the massive NVIDIA CUDA Toolkit (with its complex SDK, C++ headers, and vendor lock-in).
These tools are powerful but suffer from:
OxiCUDA 0.1.2 ends all of that.
It delivers ≥90–95% of native CUDA performance while being 100% memory-safe and auditable.
Notable results (target on supported architectures):
OxiCUDA is organized into 10 volumes + 7 backends (28 crates total), mirroring the CUDA ecosystem while being radically simplified and optimized for Rust:
Foundation (4 crates)
Dynamic driver loading, type-safe DeviceBuffer<T>, RAII memory management, launch! macro.
PTX Codegen + Autotuner (2 crates)
Pure-Rust PTX IR DSL (SM 7.5–10.0), Tensor-Core WMMA/MMA/WGMMA, 3-tier runtime autotuner with disk cache.
Linear Algebra (1 crate)
Full cuBLAS equivalent (GEMM, batched, reductions, elementwise).
Deep Learning (1 crate)
cuDNN replacement: Conv (Winograd/direct/fused), FlashAttention, Norm layers, FP8/INT8/INT4 quant.
Scientific Computing (4 crates)
cuFFT, cuSPARSE, cuSOLVER, cuRAND (Stockham FFT, SpMV/SpGEMM, LU/QR/SVD/Cholesky, Philox RNG).
6–10. Signal Processing, Computation Graph, GPU Training, Inference Engine, Reinforcement Learning
MFCC/STFT, CUDA Graphs, fused optimizers (Adam/AdamW), paged KV-cache, PPO/DQN/SAC, etc.
Backends (7 crates)
Native NVIDIA (via libcuda), Metal, Vulkan, WebGPU, ROCm, LevelZero + generic trait.
Key Rust advantages:
OxiCUDA is now the official GPU computing backend for the entire COOLJAPAN stack (total ecosystem: 21M+ SLoC Rust, 597 crates, 40+ production-grade libraries):
Repository: https://github.com/cool-japan/oxicuda
Star the repo if you want GPU computing without the CUDA Toolkit or C++ toolchain.
The era of “just install the CUDA SDK” is over.
Pure Rust GPU computing is here — fast, safe, portable, and sovereign.
— KitaSan at COOLJAPAN OÜ April 15, 2026