COOLJAPAN
← All posts

NumRS2 0.4.0 Released — Autodiff, Distributed Foundations, In-Browser WASM, and a Quantum-to-FEM Applied-Math Explosion

NumRS2 0.4.0 is the biggest release in the series: forward/reverse automatic differentiation with Hessians and hyperdual numbers, a distributed data/model-parallel framework, WebAssembly bindings that run numerics in the browser, plotters-based visualization, and a vast applied-math expansion — reinforcement learning, quantum computing, computer vision, computational geometry, FEM, wavelets, graphs, information theory, and control systems — all on pure-Rust SciRS2 0.5.0 with 128+ SIMD functions and 3,921+ passing tests.

release numrs2 numpy numerical-computing scirs2 simd automatic-differentiation wasm distributed pure-rust linear-algebra

The N-dimensional array core just grew gradients, a cluster framework, a browser runtime, and an applied-math library the size of a small university curriculum — all in pure Rust.

Today we released NumRS2 0.4.0 — by a wide margin the biggest release in the series, adding forward and reverse automatic differentiation, a distributed data/model-parallel foundation, WebAssembly bindings, plotters-based visualization, and an enormous applied-math expansion spanning reinforcement learning, quantum computing, computer vision, computational geometry, FEM, wavelets, graphs, information theory, and control systems — on top of a full lift to pure-Rust SciRS2 0.5.0.

No C. No Fortran. No system BLAS/LAPACK. No Python interpreter overhead. No FFI. Just clean, blazing-fast N-dimensional arrays — broadcasting, fancy indexing, SVD, FFT, autodiff, and now gradients, quantum circuits, FEM solvers, and wavelets — that compile to a single static binary (or WASM) and run everywhere: from laptops to browsers to edge devices to cloud clusters.

This release crosses a line. The earlier 0.x line was about being a faithful, sovereign NumPy-class array core. 0.4.0 keeps that core — and then pushes decisively beyond NumPy into territory NumPy itself never owned: differentiable programming, distributed scaffolding, in-browser execution, and a deep bench of applied numerical methods.

Why NumRS2 0.4.0 is a game changer

NumPy is the bedrock of scientific Python, and it remains a remarkable piece of engineering. But its ceiling is real, and you hit it the moment your work outgrows “fast arrays on one machine.” NumPy is built on C and Fortran, dependent on a system BLAS/LAPACK you have to link at install time, throttled by the GIL, brutal to ship into the browser or onto an embedded device, and quietly hostile to bit-for-bit reproducibility once native codecs and platform math get involved. And the moment you need a gradient, NumPy has nothing to offer — you reach for a separate framework, accept finite differences, or hand-derive Jacobians.

NumRS2 0.4.0 answers all of that at once, and the leap is broad:

And the scale numbers tell the story of a release that earns the word major. NumRS2 0.4.0 ships 128+ SIMD-vectorized functions (AVX2, AVX512, and ARM NEON), 3,921+ tests passing, 225,975+ lines of production Rust code, 5,813+ public API items, zero stubs in the shipped surface, all built on pure Rust SciRS2 v0.5.0.

Technical Deep Dive: From Differentiable Arrays to Quantum Circuits to the Browser

NumRS2 0.4.0 is large enough that it is best read in layers. Each one is grounded in real module paths you can open today.

Layer 1 — The differentiable & optimization core. The headline capability is automatic differentiation in src/autodiff/. NumRS2 now supports both forward mode (efficient for many-output, few-input functions, via dual numbers) and reverse mode (efficient for few-output, many-input functions — the backprop direction), plus genuine higher-order differentiation: Hessians, Jacobians, hyperdual numbers for exact second derivatives, and Taylor mode for higher-order series. That turns the array core into a differentiable substrate — gradients of array computations come out exact, not approximated. Around autodiff, the optimization surface deepened too: a full CMA-ES optimizer (src/optimize/cma_es/) with IPOP restarts, step-size adaptation, and both rank-μ and rank-one covariance updates; Bayesian optimization (src/optimize/bayesian_opt.rs) with a Gaussian-process surrogate, EI/PI/UCB acquisition functions, and Matérn/RBF kernels; and on the Python side, py_minimize now supports a "bfgs" method (numerical gradient via central differences) alongside "nelder-mead". Black-box objectives, gradient-based optimization, and evolutionary search are now all in-house.

Layer 2 — Numerically-stable linear algebra. This is where 0.4.0 quietly fixes things that used to be wrong on real-world sizes. In linalg_stable.rs, svd_bidiagonal now runs a full Golub–Kahan bidiagonalization followed by Jacobi SVD instead of falling back to the n≤3 path — so SVD of large matrices is actually computed, not approximated by a small-matrix shortcut. Companion to it, symmetric_eigen_tridiagonal now runs cyclic Jacobi sweeps (the README banner phrases the eigenpath as “real eigendecomposition via QR iteration with Wilkinson shifts”) instead of the n≤3 fallback. The Schur decomposition replaced a single Rayleigh-shift with Francis implicit double-shift QR plus deflation, producing a correct real Schur form even when complex-conjugate eigenpairs force 2×2 blocks. And the FEM-driven matrix_determinant / matrix_inverse now handle arbitrary n×n via LU with partial pivoting — they previously hard-errored for n>3, which had blocked 3D and higher-order elements outright. These are correctness wins, not just speed: large-matrix decompositions now give the right answer.

Layer 3 — Applied-math breadth. This is the layer that makes 0.4.0 feel like a different library. A non-exhaustive tour:

On the econometrics side, VECM now uses full Johansen (1988) cointegration / error-correction estimation in var.rs, replacing a placeholder.

Layer 4 — Reach everywhere. Three additions extend where NumRS2 can run and what it can produce. WebAssembly bindings (src/wasm/) expose array, linalg, stats, and utility surfaces, with a JS/Vite demo in examples/wasm/ — NumRS2 now runs real numerical code in the browser. (For a clean wasm32-unknown-unknown build you disable the gpu and distributed features, which pull in tokio; with those off, the numerics compile and run in a tab.) The distributed framework (src/distributed/) lands the scaffolding for cluster-scale work: data and model parallelism, a coordinator, and distributed optimizers. Be precise about its maturity, because we are: this is a foundation in 0.4.0 — the collective ops (scatter/gather/all-reduce) are still stubbed pending a real network transport, and distributed/linalg.rs returns NotImplemented. The data/model-parallel structure and the distributed optimizers are real and landing now; the wire transport is the clearly-scoped next milestone. Rounding out reach: plotters-based visualization (src/viz/) for 2D/3D, matrix, statistics, and performance plots — pure Rust, no matplotlib — and ONNX-compatible model I/O and serving (src/new_modules/model_io/, src/new_modules/serving/) with an inference engine, a model registry, and real-time monitoring.

Layer 5 — The pure-Rust substrate. None of this leaves the COOLJAPAN sovereignty story. Every scirs2-* dependency — core, stats, linalg, ndimage, spatial, special, fft, and numpy — moves to v0.5.0 in one coordinated step. Linear algebra rides on OxiBLAS (pure-Rust BLAS/LAPACK), serialization on OxiCode (now 0.2.4), and compression on OxiARC (oxiarc-archive and oxiarc-lz4 now 0.3.2) — the pure-Rust archive layer that powers .npz storage and distributed transfer alike. Top to bottom, there is not a line of C or Fortran in the default build.

Getting Started

cargo add numrs2
use numrs2::prelude::*;

fn main() -> Result<()> {
    // N-dimensional arrays with NumPy-style broadcasting
    let a = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0]).reshape(&[2, 2]);
    let b = Array::from_vec(vec![5.0, 6.0, 7.0, 8.0]).reshape(&[2, 2]);

    // Element-wise and matrix operations
    let c = a.add(&b);                 // element-wise add
    let e = a.matmul(&b)?;             // matrix multiply on OxiBLAS
    println!("a + b = {}", c);
    println!("a @ b = {}", e);

    // Full Golub–Kahan SVD — now computed for large matrices, not approximated
    let (u, s, vt) = a.svd_compute()?;
    println!("singular values = {}", s);
    let _ = (u, vt);

    // Symmetric eigendecomposition via Wilkinson-shift QR iteration
    let sym = Array::from_vec(vec![2.0, 1.0, 1.0, 2.0]).reshape(&[2, 2]);
    let (eigvals, _) = sym.eigh("lower")?;
    println!("eigenvalues = {}", eigvals);

    // Descriptive statistics
    let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
    println!("mean = {}, std = {}", data.mean()?, data.std()?);

    Ok(())
}

The array, linear-algebra, and statistics calls above are the safe, copy-pasteable core. The new 0.4.0 capabilities — autodiff, quantum circuits, FEM, wavelets, the WASM bindings — are reached through their respective modules (numrs2::autodiff, src/new_modules/*); see the Tips below and the per-module examples in the repository.

What’s New in 0.4.0

New modules and capabilities

New stats / neural-network functions

Python and econometrics

Stability and correctness fixes

Dependencies

Tips

This is the foundation

NumRS2 is the mature, NumPy-class N-dimensional array core at the base of the COOLJAPAN scientific stack, and 0.4.0 establishes it as far more than an array library — it is now a differentiable, distributable, browser-ready numerical platform. It sits directly beneath SciRS2, depending on the full scirs2-* family at v0.5.0 — core, stats, linalg, ndimage, spatial, special, fft, and numpy. Linear algebra rides on OxiBLAS, serialization on OxiCode 0.2.4, and compression on OxiARC 0.3.2 — all pure-Rust COOLJAPAN crates.

By June 2026 the ecosystem around it is broad and deep, and NumRS2 is the array layer the rest build on: OptiRS for optimization and PandRS for dataframes sit alongside it; the ML and applied stack spans ToRSh, sklears, and trustformers; and the systems and acceleration tier reaches from OxiCUDA for GPU compute to OxiMedia and OxiGDAL for media and geospatial. From gradients to clusters to the browser, this release pulls a NumPy-class core into the places NumPy could never reach — without surrendering a single dependency to C or Fortran.

Repository: https://github.com/cool-japan/numrs

Star the repo if you want a NumPy-class numerical foundation that does automatic differentiation, runs in the browser, scales toward the cluster, and ships quantum circuits, FEM solvers, and wavelets — with not a line of C or Fortran in the build.

Pure Rust numerical computing is here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ June 5, 2026

↑ Back to all posts