COOLJAPAN

Posts tagged #cuda

15 posts

Jun 17, 2026 · 8 min

OxiCUDA 0.2.0 Released — Adaptive RK45, Topological Data Analysis, and a Zero-Unwrap Workspace

OxiCUDA 0.2.0, the pure-Rust replacement for the NVIDIA CUDA Toolkit, lands the 'Wave AAA+64' expansion: adaptive RK45 with Richardson extrapolation, Extended Persistence and Discrete Morse theory (oxicuda-tda), Parametric UMAP, and Fisher Information estimation — plus a workspace-wide zero-unwrap reliability pass and 32,320 passing tests. ~783K lines across 73 crates. No CUDA SDK, no nvcc.

releaseoxicudacuda
Jun 16, 2026 · 9 min

Legalis-RS 0.1.6 Released — GPU-Accelerated Legal Simulation, Quantum-Safe Audit, and a C-Free Storage Layer

Pure-Rust legal statute engine. 0.1.6 adds real NVIDIA CUDA GPU acceleration for population-scale simulation (optional, with transparent CPU fallback), a hardened security/governance API layer, an autonomous and post-quantum-safe compliance audit subsystem, legal analytics with risk heatmaps, French civil/company law, and a fully C-free storage backend via OxiSQL. 18,398 tests passing.

releaselegalislegal-tech
Jun 8, 2026 · 8 min

OxiBonsai 0.2.2 Released — An Interactive Image REPL with Inline Terminal Rendering

OxiBonsai 0.2.2 adds `oxibonsai repl`: a resident ImageSession that loads the DiT, VAE, and text encoder once and iterates on prompts without re-paying the load/dequant cost — with images shown inline in Ghostty via a pure-Rust kitty graphics protocol, a `:fast`/`:hq` preview→finalize loop, and documented per-platform GPU flags. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm
Jun 3, 2026 · 7 min

OxiBonsai 0.2.0 Released — Concurrent /serve, Byte-Identical CPU↔Metal, and Reproducible Images

OxiBonsai 0.2.0 opens the 0.2 series: a concurrent engine pool that shares one 1.16 GB embedding table across replicas, a CPU↔Metal byte-identical parity guard, a parity-first CUDA imagen backend (~3.2× to ~31.7s on A4000), --seed byte-exact reproducible images, and a stable-toolchain build — sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm
May 21, 2026 · 7 min

OxiCUDA 0.1.8 Released — Numerical-Stability and Allocator Tuning Polish

Pure-Rust CUDA Toolkit replacement: a maintenance release with numerical-stability refinements in the HMC variational sampler, stream-ordered allocator tuning, and TriMap reduction polish — 23,535 passing tests. No CUDA SDK, no nvcc.

releaseoxicudacuda
May 16, 2026 · 8 min

OxiCUDA 0.1.7 Released — Tensor Core SYR2K Completes the Symmetric Rank-Update Family

Pure-Rust replacement for the entire NVIDIA CUDA Toolkit. 0.1.7 adds a SYR2K Tensor Core kernel (fused A×Bᵀ + B×Aᵀ rank-2k update) to oxicuda-blas, cross-subsystem CUDA kernel enhancements, and Multi-Operation Scheduling improvements. No CUDA SDK, no nvcc, no C/C++ toolchain.

releaseoxicudacuda
May 9, 2026 · 8 min

OxiCUDA 0.1.6 Released — Tensor Core SYRK Fast Path and Sixteen New ML Crates

Pure-Rust replacement for the NVIDIA CUDA Toolkit. OxiCUDA 0.1.6 adds a Tensor Core fast path for SYRK in oxicuda-blas and sixteen new ML crates (adversarial, SSL, continual, multimodal, 3D geometry, PINN, ANN, anomaly, causal, meta, MoE, NeRF, quantum, recsys, RLHF, tabular). No CUDA SDK, no nvcc.

releaseoxicudacuda
May 3, 2026 · 8 min

OxiBonsai 0.1.3 Released — Prefix-Cache-Aware Serving with Byte-Identical Warm Paths

OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm
May 3, 2026 · 8 min

OxiCUDA 0.1.5 Released — Nine New GPU Deep-Learning Crates (GenAI, GNN, Mamba, ViT, Audio, Time-Series, Bayesian, Federated, NAS)

The pure-Rust NVIDIA CUDA Toolkit replacement adds nine new GPU deep-learning crates — generative diffusion, graph neural nets, Mamba SSMs, vision transformers, audio/speech, time-series, Bayesian DL, federated learning, and NAS — growing to ~320K lines across 37 crates with 9,568 passing tests. No CUDA SDK, no nvcc.

releaseoxicudacuda
Apr 18, 2026 · 7 min

OxiBonsai 0.1.1 Released — Sub-2-Bit Inference Goes GPU, and the Ternary Line Lands

Five days after its 1-bit debut, OxiBonsai grows GPUs: a native CUDA NVRTC backend (~21.9 tok/s on Ternary-Bonsai-1.7B, RTX 3060) and a fused Metal full-forward path (~50 tok/s, ~13x speedup) — plus the new ternary TQ2_0_g128 quant family, with NEON/AVX2/AVX-512 GEMV so it flies on CPU too. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, still with no llama.cpp, no BLAS, no C/Fortran.

releaseoxibonsaillm
Apr 18, 2026 · 5 min

OxiCUDA 0.1.4 Released — Continued Quality and Documentation Polish

A small maintenance release for OxiCUDA, the pure-Rust replacement for the NVIDIA CUDA Toolkit. Workspace-wide documentation and quality improvements, with all 28 crates aligned to 0.1.4 so the stack ships in lockstep. The only runtime dependency is the NVIDIA driver.

releaseoxicudacuda
Apr 17, 2026 · 5 min

OxiCUDA 0.1.3 Released — Documentation and Quality Hardening Across All Crates

A quality-and-docs maintenance release for the pure-Rust NVIDIA CUDA Toolkit replacement — workspace-wide polish, internal version alignment to 0.1.3, and continued growth to ~260K lines of safe Rust across 28 crates. The only runtime dependency is still the NVIDIA driver.

releaseoxicudacuda
Apr 15, 2026 · 3 min

OxiCUDA 0.1.2 Released — Pure Rust CUDA Toolkit Replacement

Complete, type-safe, memory-safe rewrite of the entire NVIDIA CUDA Toolkit in pure Rust. cuBLAS/cuDNN/cuFFT/cuSPARSE/cuSOLVER/cuRAND and more — all in 253k SLoC across 28 crates. Only runtime dependency is the NVIDIA driver. PTX codegen + autotuner, 7 GPU backends (Metal/Vulkan/WebGPU/ROCm/LevelZero). ≥90–95% of native CUDA performance. The sovereign GPU computing layer for SciRS2 and the entire COOLJAPAN ecosystem (now 21M+ SLoC total).

releaseoxicudacuda
Apr 14, 2026 · 6 min

OxiCUDA 0.1.1 Released — New BLAS Activations and Hardened GPU Backends

First patch on the pure-Rust NVIDIA CUDA Toolkit replacement: six new oxicuda-blas elementwise activations (HardSigmoid, HardSwish, Softplus, LeakyRelu, Ceil, Floor) plus substantial ROCm/Vulkan/WebGPU backend growth. ~248K lines across 28 crates.

releaseoxicudacuda
Apr 13, 2026 · 8 min

OxiCUDA 0.1.0 Released — A Pure Rust Replacement for the NVIDIA CUDA Toolkit

OxiCUDA 0.1.0 is a pure-Rust, type-safe, memory-safe replacement for the entire NVIDIA CUDA Toolkit software stack — cuBLAS, cuDNN, cuFFT, cuSPARSE, cuSOLVER, cuRAND and more in ~239K lines across 28 crates. The only runtime dependency is the NVIDIA driver. PTX code generation plus a built-in autotuner, all from safe Rust.

releaseoxicudacuda