COOLJAPAN
← All posts

OxiBLAS 0.2.0 Released — Recursive Factorizations, Batched BLAS, and no_std

OxiBLAS 0.2.0 is a major step up: cache-oblivious recursive and parallel factorizations, batched BLAS, runtime auto-tuning, multifrontal sparse solvers, mixed-precision refinement, NUMA-aware allocation, and no_std support — with the Fortran FFI retired in favor of a fully pure Rust workspace.

release oxiblas blas lapack scirs2 pure-rust simd scientific-computing linear-algebra no-std

The pure Rust BLAS/LAPACK foundation just grew up.

Today we released OxiBLAS 0.2.0 — the largest update yet to our pure Rust implementation of BLAS and LAPACK. Recursive and parallel factorizations, batched BLAS, runtime auto-tuning, multifrontal sparse solvers, mixed-precision refinement, NUMA-aware memory, and no_std support all land in one release.

No C. No Fortran. No external shared libraries. No FFI overhead. No build hell. Just clean, memory-safe linear algebra that compiles to a single static binary (or WASM) and runs everywhere — now down to no_std + alloc targets.

Why OxiBLAS 0.2.0 is a game changer

The 0.1.x line proved a pure Rust BLAS could match OpenBLAS on dense GEMM. 0.2.0 is about everything that surrounds GEMM in real numerical workloads: factorizations that adapt to cache, solvers that scale across cores, batched kernels for small-matrix workloads, and accuracy that doesn’t force you to pay full double-precision cost.

A few of the headline wins, grounded in this release:

Technical Deep Dive: what changed under the hood

0.2.0 deepens every layer of the workspace:

  1. Core (oxiblas-core) New runtime SIMD dispatch infrastructure — SimdCapabilities, SimdDispatcher, KernelSelector, and a simd_dispatch! macro for function multi-versioning. NUMA-aware allocation arrives via NumaVec<T> and MatNuma<T> with real Linux topology detection, plus customizable thread pools (set_global_thread_pool, OxiblasThreadConfig). And crucially, oxiblas-core and oxiblas-matrix now support #![no_std] with alloc.

  2. BLAS (oxiblas-blas) Batched operations — gemm_batched, gemm_strided_batched, axpy_batched, gemv_batched, each with parallel variants. Runtime auto-tuning via RuntimeAutoTuner and the gemm_auto_tuned() convenience function. New SSE4.2 intermediate GEMM micro-kernels (F64x2Sse, F32x4Sse, 4×4 tiles) fill the gap between scalar and AVX2 on older x86_64.

  3. LAPACK (oxiblas-lapack) Recursive and parallel factorizations, complex bidiagonal reduction (ComplexBidiagFactors), and the full mixed-precision refinement family. A new tests/lapack_compat.rs integration suite adds 61 tests across LU, Cholesky, QR, SVD, EVD, and solve.

  4. Sparse (oxiblas-sparse) Multifrontal factorizations — MultifrontalCholesky and MultifrontalLU with elimination-tree construction and supernodal aggregation — plus advanced sparse LU pivoting (SparseLuThreshold, SuperLU-style SparseLuStaticPivot, and Bunch-Kaufman SparseLdlt), and standard test-matrix generators (laplacian_2d/3d, tridiagonal, arrow_matrix, random_spd, poisson_1d).

  5. Tooling & ndarray A performance regression framework (PerfBaseline, RegressionChecker, JSON baselines) with a regress CLI (capture / check / report / list) for CI throughput tracking, parallel matmul_par for ndarray, and sparse interop (array2_to_csr, spmv_ndarray, sparse_solve_ndarray).

This release also marks a milestone for the Pure Rust ecosystem policy: oxiblas-ffi has been retired from the workspace (the directory remains as a deprecated archive). OxiBLAS is now an end-to-end pure Rust stack with zero unwrap() calls in production code and every source file under the 2,000-line limit.

Getting Started

cargo add oxiblas

Recursive, cache-oblivious Cholesky straight from the prelude:

use oxiblas::prelude::*;

// A symmetric positive-definite matrix
let a = Mat::from_rows(&[
    &[4.0, 1.0, 1.0],
    &[1.0, 3.0, 0.0],
    &[1.0, 0.0, 2.0],
]);

// Divide-and-conquer factorization that adapts to the cache hierarchy
let chol = Cholesky::compute_recursive(a.as_ref()).expect("not positive definite");
let l = chol.l_factor(); // lower-triangular factor, A = L * Lᵀ

Track throughput regressions in CI with the bundled regress binary:

# Capture a baseline, then fail the build if performance drops > 5%
cargo run -p oxiblas-benchmarks --bin regress -- capture --output baseline.json
cargo run -p oxiblas-benchmarks --bin regress -- check --baseline baseline.json --threshold 5.0

What’s New in 0.2.0

The release ships with roughly 169,900 lines of Rust across 359 files, 2,835 passing tests plus 195 doctests.

Tips

This is the foundation

By March 2026 the COOLJAPAN scientific stack is in full bloom, and OxiBLAS is its mathematical bedrock:

OxiBLAS 0.2.0 makes that foundation faster, broader, and — with the FFI retired — completely pure Rust, top to bottom.

Repository: https://github.com/cool-japan/oxiblas

Star the repo if you want high-performance scientific computing without the traditional toolchain headaches.

The era of “just link OpenBLAS” is ending.

Pure Rust numerical linear algebra is here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ March 7, 2026

↑ Back to all posts