15 posts
OxiCUDA 0.2.0, the pure-Rust replacement for the NVIDIA CUDA Toolkit, lands the 'Wave AAA+64' expansion: adaptive RK45 with Richardson extrapolation, Extended Persistence and Discrete Morse theory (oxicuda-tda), Parametric UMAP, and Fisher Information estimation — plus a workspace-wide zero-unwrap reliability pass and 32,320 passing tests. ~783K lines across 73 crates. No CUDA SDK, no nvcc.
Pure-Rust legal statute engine. 0.1.6 adds real NVIDIA CUDA GPU acceleration for population-scale simulation (optional, with transparent CPU fallback), a hardened security/governance API layer, an autonomous and post-quantum-safe compliance audit subsystem, legal analytics with risk heatmaps, French civil/company law, and a fully C-free storage backend via OxiSQL. 18,398 tests passing.
OxiBonsai 0.2.2 adds `oxibonsai repl`: a resident ImageSession that loads the DiT, VAE, and text encoder once and iterates on prompts without re-paying the load/dequant cost — with images shown inline in Ghostty via a pure-Rust kitty graphics protocol, a `:fast`/`:hq` preview→finalize loop, and documented per-platform GPU flags. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
OxiBonsai 0.2.0 opens the 0.2 series: a concurrent engine pool that shares one 1.16 GB embedding table across replicas, a CPU↔Metal byte-identical parity guard, a parity-first CUDA imagen backend (~3.2× to ~31.7s on A4000), --seed byte-exact reproducible images, and a stable-toolchain build — sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
Pure-Rust CUDA Toolkit replacement: a maintenance release with numerical-stability refinements in the HMC variational sampler, stream-ordered allocator tuning, and TriMap reduction polish — 23,535 passing tests. No CUDA SDK, no nvcc.
Pure-Rust replacement for the entire NVIDIA CUDA Toolkit. 0.1.7 adds a SYR2K Tensor Core kernel (fused A×Bᵀ + B×Aᵀ rank-2k update) to oxicuda-blas, cross-subsystem CUDA kernel enhancements, and Multi-Operation Scheduling improvements. No CUDA SDK, no nvcc, no C/C++ toolchain.
Pure-Rust replacement for the NVIDIA CUDA Toolkit. OxiCUDA 0.1.6 adds a Tensor Core fast path for SYRK in oxicuda-blas and sixteen new ML crates (adversarial, SSL, continual, multimodal, 3D geometry, PINN, ANN, anomaly, causal, meta, MoE, NeRF, quantum, recsys, RLHF, tabular). No CUDA SDK, no nvcc.
OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
The pure-Rust NVIDIA CUDA Toolkit replacement adds nine new GPU deep-learning crates — generative diffusion, graph neural nets, Mamba SSMs, vision transformers, audio/speech, time-series, Bayesian DL, federated learning, and NAS — growing to ~320K lines across 37 crates with 9,568 passing tests. No CUDA SDK, no nvcc.
Five days after its 1-bit debut, OxiBonsai grows GPUs: a native CUDA NVRTC backend (~21.9 tok/s on Ternary-Bonsai-1.7B, RTX 3060) and a fused Metal full-forward path (~50 tok/s, ~13x speedup) — plus the new ternary TQ2_0_g128 quant family, with NEON/AVX2/AVX-512 GEMV so it flies on CPU too. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, still with no llama.cpp, no BLAS, no C/Fortran.
A small maintenance release for OxiCUDA, the pure-Rust replacement for the NVIDIA CUDA Toolkit. Workspace-wide documentation and quality improvements, with all 28 crates aligned to 0.1.4 so the stack ships in lockstep. The only runtime dependency is the NVIDIA driver.
A quality-and-docs maintenance release for the pure-Rust NVIDIA CUDA Toolkit replacement — workspace-wide polish, internal version alignment to 0.1.3, and continued growth to ~260K lines of safe Rust across 28 crates. The only runtime dependency is still the NVIDIA driver.
Complete, type-safe, memory-safe rewrite of the entire NVIDIA CUDA Toolkit in pure Rust. cuBLAS/cuDNN/cuFFT/cuSPARSE/cuSOLVER/cuRAND and more — all in 253k SLoC across 28 crates. Only runtime dependency is the NVIDIA driver. PTX codegen + autotuner, 7 GPU backends (Metal/Vulkan/WebGPU/ROCm/LevelZero). ≥90–95% of native CUDA performance. The sovereign GPU computing layer for SciRS2 and the entire COOLJAPAN ecosystem (now 21M+ SLoC total).
First patch on the pure-Rust NVIDIA CUDA Toolkit replacement: six new oxicuda-blas elementwise activations (HardSigmoid, HardSwish, Softplus, LeakyRelu, Ceil, Floor) plus substantial ROCm/Vulkan/WebGPU backend growth. ~248K lines across 28 crates.
OxiCUDA 0.1.0 is a pure-Rust, type-safe, memory-safe replacement for the entire NVIDIA CUDA Toolkit software stack — cuBLAS, cuDNN, cuFFT, cuSPARSE, cuSOLVER, cuRAND and more in ~239K lines across 28 crates. The only runtime dependency is the NVIDIA driver. PTX code generation plus a built-in autotuner, all from safe Rust.