The pure Rust linear algebra foundation has arrived.
Today we released OxiBLAS 0.1.0 — the first public release of a complete, pure Rust implementation of BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage).
No C. No Fortran. No external shared libraries.
No FFI overhead. No build hell.
Just clean, memory-safe linear algebra that compiles to a single static binary and runs everywhere — including WASM and no_std targets.
Why 0.1.0 matters
For decades, high-performance numerical computing in any language has meant linking against battle-tested but heavy C/Fortran libraries: OpenBLAS, Intel MKL, Apple Accelerate, Reference LAPACK. They are fast, but they bring real costs:
- Complex build systems and dependency hell
- Large native codebases that are hard to audit
- Platform-specific binaries and vendor lock-in
- Awkward or impossible use in WASM, embedded, and
no_stdenvironments
OxiBLAS exists to give the Rust ecosystem — and the imminent SciRS2 scientific computing stack — a sovereign mathematical foundation that needs none of that. It is the linear algebra backend SciRS2 is being built on top of, readied so that the rest of the ecosystem can stand on safe, portable, all-Rust numerics.
And as a first release, it is already competitive. On large matrices, OxiBLAS DGEMM matches OpenBLAS:
- Linux x86_64 (AVX2/FMA): DGEMM f64 reaches 80–112% of OpenBLAS across sizes; 102% at 1024×1024 (213 vs 208 GFLOPS — faster than OpenBLAS), peak 220 GFLOPS at 256×256. SGEMM f32 hits 112% at 64×64.
- Apple M3 (NEON): f64 GEMM matches or exceeds OpenBLAS, peaking at 427 GFLOPS (1024×1024).
For a 0.1.0 written entirely in Rust intrinsics, that is a strong starting line.
Technical Deep Dive: How BLAS/LAPACK is rebuilt in pure Rust
OxiBLAS ships as a Cargo workspace of focused crates, re-exported through the unified oxiblas crate:
-
Core (
oxiblas-core) A custom SIMD abstraction overcore::archintrinsics with runtime feature detection — AVX-512F → AVX2/FMA → SSE4.2 on x86_64, NEON on AArch64, scalar fallback everywhere else. Plus extended-precision scalars (f16, f128 quad precision), complex types (Complex32/Complex64), arena allocation, 64-byte aligned vectors, and optional rayon parallelism. -
Matrix (
oxiblas-matrix) TheMat/MatRef/MatMuttypes with column-major storage for BLAS/LAPACK compatibility, plus views, diagonals, and a lazy expression layer for operation fusion. -
BLAS (
oxiblas-blas) Complete Level 1 (11 ops: dot, axpy, nrm2, scal, swap, copy, rot, iamax, asum…), Level 2 (15 ops: gemv, ger, symv, trsv, banded and packed variants…), and Level 3 (gemm, syrk, trsm, symm, hemm, herk…). GEMM uses BLIS-style MC×KC×NC blocking with SIMD micro-kernels and prefetching. Tensor extras include Einstein summation across 24 patterns. -
LAPACK (
oxiblas-lapack) LU (with partial and full pivoting), Cholesky, LDL^T, QR (with column pivoting), SVD, symmetric/general eigenvalue decomposition, Schur and Hessenberg, triangular/general/tridiagonal solvers, least squares, condition estimation, and matrix inversion. -
Sparse (
oxiblas-sparse) 9 sparse formats (CSR, CSC, COO, ELL, DIA, BSR, BSC, HYB, SELL-C-σ), 10 iterative solvers (CG/PCG, BiCGStab, GMRES, MINRES, IDR(s), TFQMR, QMR, Block-CG, Block-GMRES), Lanczos/Arnoldi/IRAM eigensolvers, and a deep preconditioner suite (Jacobi, ILU0/ILUT/ILUTP, IC0/ICT, AMG, SPAI, AINV, Schwarz).
Optional oxiblas-ndarray provides ndarray interop, and oxiblas-ffi exposes a C-ABI drop-in for existing BLAS/LAPACK call sites.
Getting Started
cargo add oxiblas
A first matrix multiply, straight from the prelude:
use oxiblas::prelude::*;
// C = A * B
let a = Mat::from_rows(&[
&[1.0, 2.0, 3.0],
&[4.0, 5.0, 6.0],
]);
let b = Mat::from_rows(&[
&[7.0, 8.0],
&[9.0, 10.0],
&[11.0, 12.0],
]);
let mut c = Mat::zeros(2, 2);
gemm(1.0, a.as_ref(), b.as_ref(), 0.0, c.as_mut());
assert!((c[(0, 0)] - 58.0).abs() < 1e-10); // [[58, 64], [139, 154]]
Level 1 vector ops are just as direct:
use oxiblas_blas::level1::{axpy, dot};
let x = vec![1.0, 2.0, 3.0, 4.0];
let y = vec![5.0, 6.0, 7.0, 8.0];
assert_eq!(dot(&x, &y), 70.0); // 1*5 + 2*6 + 3*7 + 4*8
let mut y = vec![1.0, 2.0, 3.0, 4.0];
axpy(2.5, &[10.0, 20.0, 30.0, 40.0], &mut y); // y = 2.5*x + y
What’s inside
- Complete BLAS — Level 1 (11 ops), Level 2 (15 ops), Level 3 (11 ops), including full packed and banded variants and Hermitian/symmetric operations
- Extensive LAPACK — LU, full-pivot LU, banded LU, Cholesky, LDL^T, QR with pivoting, SVD, symmetric and general EVD, Schur, Hessenberg, triangular/general/tridiagonal solvers, least squares
- Sparse linear algebra — 9 formats, 10 iterative solvers, Lanczos/Arnoldi/IRAM eigensolvers, truncated and randomized SVD, sparse LU/QR/Cholesky, and a large preconditioner library
- Extended precision — f16 and f128 (quad, ~31 digits), plus Kahan/pairwise/superaccurate compensated summation
- SIMD everywhere — AVX2/FMA, AVX-512F, and NEON intrinsics with automatic runtime dispatch and scalar fallback
- Tensor operations — Einstein summation across 24 patterns and batched matmul
- Interop —
oxiblas-ndarrayfor ndarray,oxiblas-ffifor a C BLAS/LAPACK drop-in, optional nalgebra conversions - Benchmarks — a criterion-based suite with direct OpenBLAS comparison
This first release lands with 469+ passing library tests and roughly 154,600 lines of Rust across 314 files.
Tips
- Sparse is on by default. The
sparsefeature is enabled out of the box, soCsrMatrixand the iterative solvers are available with no extra flags. Addparallelfor rayon-backed kernels on multi-core machines. - Reach for the prelude.
use oxiblas::prelude::*;brings inMat,gemm, the LAPACK decompositions (Lu,Qr,Svd,Cholesky), and theMatBuilderhelpers in one import. - Build matrices ergonomically.
MatBuilder::<f64>::identity(n),::zeros(m, n), and::hilbert(n)are handy for tests and quick experiments. - Parallel GEMM has its own entry point. Enable the
parallelfeature and callgemm_with_par(1.0, a.as_ref(), b.as_ref(), 0.0, c.as_mut(), Par::Rayon)forn >= 256. - Coming from NumPy? The mapping is direct:
np.linalg.svd→Svd::compute,np.linalg.solve→solve,np.linalg.qr→Qr::compute,np.linalg.cholesky→Cholesky::compute. - Need maximum precision? Turn on
f128for quad-precision accumulation, or useKahanSum/pairwise_sumfor compensated summation in ill-conditioned reductions.
This is the foundation
OxiBLAS 0.1.0 is the linear algebra bedrock of the COOLJAPAN ecosystem. It is purpose-built as the BLAS/LAPACK engine for SciRS2, whose own first release is imminent — the goal of shipping OxiBLAS first is so that SciRS2 and everything above it can rest on pure Rust numerics from day one. It joins early COOLJAPAN siblings already in the wild — VoiRS for audio, TenRSo and TensorLogic for tensors, Spintronics, and Oxicode — as part of a sovereign, C/C++/Fortran-free stack.
Repository: https://github.com/cool-japan/oxiblas
Star the repo if you want high-performance scientific computing without the traditional toolchain headaches.
The era of “just link OpenBLAS” is starting to end.
Pure Rust numerical linear algebra is here — and it’s already fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ December 28, 2025