The N-dimensional array, reborn in Rust — no C, no Fortran, no GIL.
Today we released NumRS2 0.1.0 — the first stable release of a NumPy-inspired numerical computing library for Rust, and the N-dimensional array core that powers the COOLJAPAN scientific stack.
This has been a long time coming. NumRS2 began as an alpha experiment back in spring 2025, and after months of hardening — array semantics, broadcasting rules, decomposition numerics, SIMD dispatch — it finally graduates to a stable 0.1.0.
No C. No Fortran. No system BLAS/LAPACK. No Python interpreter overhead. No FFI. Just clean, blazing-fast N-dimensional arrays — broadcasting, fancy indexing, SVD, FFT, autodiff and all — that compile to a single static binary (or WASM) and run everywhere: from laptops to browsers to edge devices to cloud clusters.
Why NumRS2 0.1.0 matters
NumPy is the bedrock of scientific Python, but its foundation shows its age. It is built on a tangle of C and Fortran, leans on system BLAS/LAPACK that you have to find and link at install time, fights the GIL the moment you want real parallelism, pays Python-loop overhead for anything the C kernels do not already cover, and ships poorly to WASM or embedded targets. Reproducibility becomes a research project of its own.
NumRS2 answers each of those with something concrete:
- 86 AVX2-vectorized functions + 42 ARM NEON operations, with automatic threshold-based dispatch between SIMD and scalar paths — so small arrays stay branch-cheap while large ones go wide, with 4-way loop unrolling and FMA where it counts.
- Pure-Rust linear algebra via OxiBLAS — matmul, decompositions, and solvers with no system BLAS to hunt down and no Fortran in the build.
- 1,111+ unit tests passing, zero compilation warnings, zero clippy errors across ~155,000 lines of production Rust.
- A single static binary (or WASM module) — the same numerical code runs on your laptop, in a browser, on the edge, and in the cloud, bit-for-bit.
SharedArray<T>with O(1) reference-counted cloning plus Common Subexpression Elimination, so expression-heavy pipelines stop re-allocating and stop recomputing.
Technical Deep Dive: Four Layers of a Pure-Rust Array Stack
Layer 1 — Core arrays + expression templates. At the bottom sits the N-dimensional Array<T> with NumPy-style broadcasting, advanced indexing (fancy indexing, boolean masking, multi-dimensional slicing), and zero-copy views. On top of it, SharedArray<T> brings expression templates: reference-counted O(1) cloning, operator overloading, Common Subexpression Elimination (CSE), and cache-aware access patterns — so a * b + a * c lifts the shared a out instead of touching it twice.
Layer 2 — Linear algebra & sparse, on OxiBLAS. Matrix multiply, transpose, inverse, and determinant; the full decomposition set — SVD, QR, LU, Cholesky, and eigenvalue; iterative solvers including CG, GMRES, and BiCGSTAB; randomized algorithms for large problems; and sparse matrices in COO, CSR, CSC, and DIA formats. All of it routes through OxiBLAS, COOLJAPAN’s pure-Rust BLAS/LAPACK.
Layer 3 — Advanced numerics. Mathematical functions span trigonometric, hyperbolic, exponential, and logarithmic families, plus special functions (gamma, beta, error, Bessel), polynomial operations, and cubic spline interpolation. Statistics covers mean/median/variance/std, distributions, hypothesis testing, and correlation. Signal processing adds FFT/IFFT, convolution, correlation, and filtering. Automatic differentiation is here too: forward mode via dual numbers, reverse tape-based mode, and higher-order Hessian/Taylor expansion.
Layer 4 — Hardware acceleration & interop. SIMD is unified through SciRS2-Core’s SimdUnifiedOps trait, with an AutoOptimizer selecting algorithms adaptively from runtime platform detection (AVX2/AVX512 on x86_64, NEON on ARM). Optional GPU acceleration arrives through wgpu (Vulkan/Metal/DX12/WebGPU). Interop reads and writes NumPy .npy/.npz, exposes Apache Arrow zero-copy buffers, handles CSV and binary serialization (the latter via OxiCode), and supports memory-mapped I/O. Optional Python bindings ship via PyO3.
Getting Started
cargo add numrs2
use numrs2::prelude::*;
fn main() -> Result<()> {
// N-dimensional arrays with NumPy-style broadcasting
let a = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0]).reshape(&[2, 2]);
let b = Array::from_vec(vec![5.0, 6.0, 7.0, 8.0]).reshape(&[2, 2]);
let c = a.add(&b); // element-wise
let e = a.matmul(&b)?; // matrix multiply
println!("a @ b = {}", e);
// Linear algebra: SVD, symmetric eigendecomposition
let (u, s, vt) = a.svd_compute()?;
let symmetric = Array::from_vec(vec![2.0, 1.0, 1.0, 2.0]).reshape(&[2, 2]);
let (eigenvalues, _eigenvectors) = symmetric.eigh("lower")?;
// Descriptive statistics
let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
println!("mean = {}, std = {}", data.mean()?, data.std()?);
Ok(())
}
What’s inside
- Core N-dimensional array with broadcasting, fancy indexing, boolean masking, multi-dimensional slicing, and zero-copy views.
- Expression templates —
SharedArray<T>with O(1) clones, operator overloading, CSE, and cache-aware access. - Linear algebra — matmul, transpose, inverse, determinant; SVD / QR / LU / Cholesky / Eigenvalue decompositions; CG / GMRES / BiCGSTAB solvers; randomized algorithms; sparse COO / CSR / CSC / DIA.
- Mathematical functions — trig, hyperbolic, exp/log, special functions (gamma, beta, error, Bessel), polynomials, cubic spline interpolation.
- Statistics — mean, median, variance, std, distributions, hypothesis testing, correlation.
- Signal processing — FFT/IFFT, convolution, correlation, filtering.
- Automatic differentiation — forward (dual numbers), reverse (tape-based), higher-order (Hessian, Taylor).
- Interop — NumPy
.npy/.npz, Apache Arrow zero-copy, CSV + binary serialization, memory-mapped I/O. - Financial computing — options pricing, bond valuation, time-value-of-money.
- SciRS2 integration — advanced distributions including noncentral Chi-square, noncentral F, Von Mises, Maxwell-Boltzmann, truncated normal, and multivariate normal with rotation.
- Optional acceleration — GPU via
wgpu(Vulkan/Metal/DX12/WebGPU) and Python bindings via PyO3.
Tips
-
Turn features on as you need them. The defaults are
["matrix_decomp", "scirs"]. Reach for the rest explicitly:[dependencies] numrs2 = { version = "0.1.0", features = ["arrow", "python", "gpu"] } -
Use
SharedArrayin pipelines. When the same intermediate appears more than once,SharedArray<T>clones in O(1) and CSE eliminates the redundant work — far cheaper than copying buffers around. -
Trust the SIMD auto-dispatch. You do not pick AVX2 vs NEON vs scalar by hand. The threshold-based dispatcher keeps small arrays out of vectorized overhead and sends large ones through the 86 AVX2 / 42 NEON kernels automatically.
-
Reach for SciRS2’s advanced distributions. Beyond the usual suspects, the
scirsintegration unlocks Von Mises, truncated normal, Maxwell-Boltzmann, and multivariate normal with rotation — directly from NumRS2. -
Stay NumPy-compatible on disk. Read and write
.npy/.npzto move arrays in and out of existing NumPy workflows with no conversion glue, and lean on memory-mapped I/O for datasets larger than RAM.
This is the foundation
NumRS2 is the NumPy-class N-dimensional array core of the COOLJAPAN scientific stack. It launches the same day as its two siblings — OptiRS (optimization) and PandRS (dataframes) — and slots directly beneath SciRS2, which shipped just yesterday (SciRS2-Core v0.1.1) and provides NumRS2’s unified SIMD, adaptive AutoOptimizer, and platform detection. Underneath, linear algebra rides on OxiBLAS and serialization on OxiCode, both fresh from December 28.
That is the whole point of a stable 0.1.0: a dependable array layer that the rest of the stack — and your own code — can build on with confidence.
Repository: https://github.com/cool-japan/numrs
Star the repo if you want a NumPy-class numerical foundation without a single line of C or Fortran in the build.
Pure Rust numerical computing is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ December 30, 2025