OxiBLAS 0.1.0 Released — Pure Rust BLAS/LAPACK, the Foundation for SciRS2

The pure Rust linear algebra foundation has arrived.

Today we released OxiBLAS 0.1.0 — the first public release of a complete, pure Rust implementation of BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage).

No C. No Fortran. No external shared libraries. No FFI overhead. No build hell. Just clean, memory-safe linear algebra that compiles to a single static binary and runs everywhere — including WASM and no_std targets.

Why 0.1.0 matters

For decades, high-performance numerical computing in any language has meant linking against battle-tested but heavy C/Fortran libraries: OpenBLAS, Intel MKL, Apple Accelerate, Reference LAPACK. They are fast, but they bring real costs:

Complex build systems and dependency hell
Large native codebases that are hard to audit
Platform-specific binaries and vendor lock-in
Awkward or impossible use in WASM, embedded, and no_std environments

OxiBLAS exists to give the Rust ecosystem — and the imminent SciRS2 scientific computing stack — a sovereign mathematical foundation that needs none of that. It is the linear algebra backend SciRS2 is being built on top of, readied so that the rest of the ecosystem can stand on safe, portable, all-Rust numerics.

And as a first release, it is already competitive. On large matrices, OxiBLAS DGEMM matches OpenBLAS:

Linux x86_64 (AVX2/FMA): DGEMM f64 reaches 80–112% of OpenBLAS across sizes; 102% at 1024×1024 (213 vs 208 GFLOPS — faster than OpenBLAS), peak 220 GFLOPS at 256×256. SGEMM f32 hits 112% at 64×64.
Apple M3 (NEON): f64 GEMM matches or exceeds OpenBLAS, peaking at 427 GFLOPS (1024×1024).

For a 0.1.0 written entirely in Rust intrinsics, that is a strong starting line.

Technical Deep Dive: How BLAS/LAPACK is rebuilt in pure Rust

OxiBLAS ships as a Cargo workspace of focused crates, re-exported through the unified oxiblas crate:

Core (oxiblas-core) A custom SIMD abstraction over core::arch intrinsics with runtime feature detection — AVX-512F → AVX2/FMA → SSE4.2 on x86_64, NEON on AArch64, scalar fallback everywhere else. Plus extended-precision scalars (f16, f128 quad precision), complex types (Complex32/Complex64), arena allocation, 64-byte aligned vectors, and optional rayon parallelism.
Matrix (oxiblas-matrix) The Mat / MatRef / MatMut types with column-major storage for BLAS/LAPACK compatibility, plus views, diagonals, and a lazy expression layer for operation fusion.
BLAS (oxiblas-blas) Complete Level 1 (11 ops: dot, axpy, nrm2, scal, swap, copy, rot, iamax, asum…), Level 2 (15 ops: gemv, ger, symv, trsv, banded and packed variants…), and Level 3 (gemm, syrk, trsm, symm, hemm, herk…). GEMM uses BLIS-style MC×KC×NC blocking with SIMD micro-kernels and prefetching. Tensor extras include Einstein summation across 24 patterns.
LAPACK (oxiblas-lapack) LU (with partial and full pivoting), Cholesky, LDL^T, QR (with column pivoting), SVD, symmetric/general eigenvalue decomposition, Schur and Hessenberg, triangular/general/tridiagonal solvers, least squares, condition estimation, and matrix inversion.
Sparse (oxiblas-sparse) 9 sparse formats (CSR, CSC, COO, ELL, DIA, BSR, BSC, HYB, SELL-C-σ), 10 iterative solvers (CG/PCG, BiCGStab, GMRES, MINRES, IDR(s), TFQMR, QMR, Block-CG, Block-GMRES), Lanczos/Arnoldi/IRAM eigensolvers, and a deep preconditioner suite (Jacobi, ILU0/ILUT/ILUTP, IC0/ICT, AMG, SPAI, AINV, Schwarz).

Optional oxiblas-ndarray provides ndarray interop, and oxiblas-ffi exposes a C-ABI drop-in for existing BLAS/LAPACK call sites.

Getting Started

cargo add oxiblas

A first matrix multiply, straight from the prelude:

use oxiblas::prelude::*;

// C = A * B
let a = Mat::from_rows(&[
    &[1.0, 2.0, 3.0],
    &[4.0, 5.0, 6.0],
]);
let b = Mat::from_rows(&[
    &[7.0,  8.0],
    &[9.0, 10.0],
    &[11.0, 12.0],
]);
let mut c = Mat::zeros(2, 2);

gemm(1.0, a.as_ref(), b.as_ref(), 0.0, c.as_mut());
assert!((c[(0, 0)] - 58.0).abs() < 1e-10); // [[58, 64], [139, 154]]

Level 1 vector ops are just as direct:

use oxiblas_blas::level1::{axpy, dot};

let x = vec![1.0, 2.0, 3.0, 4.0];
let y = vec![5.0, 6.0, 7.0, 8.0];
assert_eq!(dot(&x, &y), 70.0); // 1*5 + 2*6 + 3*7 + 4*8

let mut y = vec![1.0, 2.0, 3.0, 4.0];
axpy(2.5, &[10.0, 20.0, 30.0, 40.0], &mut y); // y = 2.5*x + y

What’s inside

Complete BLAS — Level 1 (11 ops), Level 2 (15 ops), Level 3 (11 ops), including full packed and banded variants and Hermitian/symmetric operations
Extensive LAPACK — LU, full-pivot LU, banded LU, Cholesky, LDL^T, QR with pivoting, SVD, symmetric and general EVD, Schur, Hessenberg, triangular/general/tridiagonal solvers, least squares
Sparse linear algebra — 9 formats, 10 iterative solvers, Lanczos/Arnoldi/IRAM eigensolvers, truncated and randomized SVD, sparse LU/QR/Cholesky, and a large preconditioner library
Extended precision — f16 and f128 (quad, ~31 digits), plus Kahan/pairwise/superaccurate compensated summation
SIMD everywhere — AVX2/FMA, AVX-512F, and NEON intrinsics with automatic runtime dispatch and scalar fallback
Tensor operations — Einstein summation across 24 patterns and batched matmul
Interop — oxiblas-ndarray for ndarray, oxiblas-ffi for a C BLAS/LAPACK drop-in, optional nalgebra conversions
Benchmarks — a criterion-based suite with direct OpenBLAS comparison

This first release lands with 469+ passing library tests and roughly 154,600 lines of Rust across 314 files.

Tips

Sparse is on by default. The sparse feature is enabled out of the box, so CsrMatrix and the iterative solvers are available with no extra flags. Add parallel for rayon-backed kernels on multi-core machines.
Reach for the prelude. use oxiblas::prelude::*; brings in Mat, gemm, the LAPACK decompositions (Lu, Qr, Svd, Cholesky), and the MatBuilder helpers in one import.
Build matrices ergonomically. MatBuilder::<f64>::identity(n), ::zeros(m, n), and ::hilbert(n) are handy for tests and quick experiments.
Parallel GEMM has its own entry point. Enable the parallel feature and call gemm_with_par(1.0, a.as_ref(), b.as_ref(), 0.0, c.as_mut(), Par::Rayon) for n >= 256.
Coming from NumPy? The mapping is direct: np.linalg.svd → Svd::compute, np.linalg.solve → solve, np.linalg.qr → Qr::compute, np.linalg.cholesky → Cholesky::compute.
Need maximum precision? Turn on f128 for quad-precision accumulation, or use KahanSum / pairwise_sum for compensated summation in ill-conditioned reductions.

This is the foundation

OxiBLAS 0.1.0 is the linear algebra bedrock of the COOLJAPAN ecosystem. It is purpose-built as the BLAS/LAPACK engine for SciRS2, whose own first release is imminent — the goal of shipping OxiBLAS first is so that SciRS2 and everything above it can rest on pure Rust numerics from day one. It joins early COOLJAPAN siblings already in the wild — VoiRS for audio, TenRSo and TensorLogic for tensors, Spintronics, and Oxicode — as part of a sovereign, C/C++/Fortran-free stack.

Repository: https://github.com/cool-japan/oxiblas

Star the repo if you want high-performance scientific computing without the traditional toolchain headaches.

The era of “just link OpenBLAS” is starting to end.

Pure Rust numerical linear algebra is here — and it’s already fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ December 28, 2025