COOLJAPAN

Posts tagged #blas

3 posts

May 16, 2026 · 8 min

OxiCUDA 0.1.7 Released — Tensor Core SYR2K Completes the Symmetric Rank-Update Family

Pure-Rust replacement for the entire NVIDIA CUDA Toolkit. 0.1.7 adds a SYR2K Tensor Core kernel (fused A×Bᵀ + B×Aᵀ rank-2k update) to oxicuda-blas, cross-subsystem CUDA kernel enhancements, and Multi-Operation Scheduling improvements. No CUDA SDK, no nvcc, no C/C++ toolchain.

releaseoxicudacuda
Apr 14, 2026 · 6 min

OxiCUDA 0.1.1 Released — New BLAS Activations and Hardened GPU Backends

First patch on the pure-Rust NVIDIA CUDA Toolkit replacement: six new oxicuda-blas elementwise activations (HardSigmoid, HardSwish, Softplus, LeakyRelu, Ceil, Floor) plus substantial ROCm/Vulkan/WebGPU backend growth. ~248K lines across 28 crates.

releaseoxicudacuda
Mar 16, 2026 · 3 min

OxiBLAS 0.2.1 Released — Pure Rust BLAS/LAPACK That Outperforms OpenBLAS

Production-grade BLAS and LAPACK entirely in Rust. Up to 172% of OpenBLAS performance on Apple M3, full sparse solvers, f128 precision, RuntimeAutoTuner, no_std support. The sovereign mathematical foundation for SciRS2 and the entire COOLJAPAN scientific computing stack.

releaseoxiblasblas