The N-dimensional array core just grew gradients, a cluster framework, a browser runtime, and an applied-math library the size of a small university curriculum — all in pure Rust.
Today we released NumRS2 0.4.0 — by a wide margin the biggest release in the series, adding forward and reverse automatic differentiation, a distributed data/model-parallel foundation, WebAssembly bindings, plotters-based visualization, and an enormous applied-math expansion spanning reinforcement learning, quantum computing, computer vision, computational geometry, FEM, wavelets, graphs, information theory, and control systems — on top of a full lift to pure-Rust SciRS2 0.5.0.
No C. No Fortran. No system BLAS/LAPACK. No Python interpreter overhead. No FFI. Just clean, blazing-fast N-dimensional arrays — broadcasting, fancy indexing, SVD, FFT, autodiff, and now gradients, quantum circuits, FEM solvers, and wavelets — that compile to a single static binary (or WASM) and run everywhere: from laptops to browsers to edge devices to cloud clusters.
This release crosses a line. The earlier 0.x line was about being a faithful, sovereign NumPy-class array core. 0.4.0 keeps that core — and then pushes decisively beyond NumPy into territory NumPy itself never owned: differentiable programming, distributed scaffolding, in-browser execution, and a deep bench of applied numerical methods.
Why NumRS2 0.4.0 is a game changer
NumPy is the bedrock of scientific Python, and it remains a remarkable piece of engineering. But its ceiling is real, and you hit it the moment your work outgrows “fast arrays on one machine.” NumPy is built on C and Fortran, dependent on a system BLAS/LAPACK you have to link at install time, throttled by the GIL, brutal to ship into the browser or onto an embedded device, and quietly hostile to bit-for-bit reproducibility once native codecs and platform math get involved. And the moment you need a gradient, NumPy has nothing to offer — you reach for a separate framework, accept finite differences, or hand-derive Jacobians.
NumRS2 0.4.0 answers all of that at once, and the leap is broad:
- Automatic differentiation lands in the array core. Forward and reverse mode, with higher-order support — Hessians, Jacobians, hyperdual numbers, and Taylor mode — live in
src/autodiff/. You can compute exact gradients of array computations directly, no separate framework and no finite-difference error. - A distributed foundation arrives. A data/model-parallel framework with collectives, distributed optimizers, and a coordinator (roughly 8,000 lines) lands in
src/distributed/— the scaffolding for scaling NumRS2 across a cluster. - Numerics now run in the browser. WebAssembly bindings for arrays, linear algebra, stats, and utilities ship in
src/wasm/, with a working JS/Vite demo inexamples/wasm/. The same code that runs on your laptop runs in a tab. - An applied-math explosion. Reinforcement learning, quantum computing, computer vision, computational geometry, finite-element analysis, wavelets, graph algorithms, information theory, control systems, and CODATA physical constants all arrive as first-class modules.
- The linear-algebra core got genuinely more correct on large matrices. Real eigendecomposition via QR iteration with Wilkinson shifts; full Golub–Kahan bidiagonal SVD; Francis implicit double-shift Schur — the old “fall back to the n≤3 path” shortcuts are gone.
And the scale numbers tell the story of a release that earns the word major. NumRS2 0.4.0 ships 128+ SIMD-vectorized functions (AVX2, AVX512, and ARM NEON), 3,921+ tests passing, 225,975+ lines of production Rust code, 5,813+ public API items, zero stubs in the shipped surface, all built on pure Rust SciRS2 v0.5.0.
Technical Deep Dive: From Differentiable Arrays to Quantum Circuits to the Browser
NumRS2 0.4.0 is large enough that it is best read in layers. Each one is grounded in real module paths you can open today.
Layer 1 — The differentiable & optimization core. The headline capability is automatic differentiation in src/autodiff/. NumRS2 now supports both forward mode (efficient for many-output, few-input functions, via dual numbers) and reverse mode (efficient for few-output, many-input functions — the backprop direction), plus genuine higher-order differentiation: Hessians, Jacobians, hyperdual numbers for exact second derivatives, and Taylor mode for higher-order series. That turns the array core into a differentiable substrate — gradients of array computations come out exact, not approximated. Around autodiff, the optimization surface deepened too: a full CMA-ES optimizer (src/optimize/cma_es/) with IPOP restarts, step-size adaptation, and both rank-μ and rank-one covariance updates; Bayesian optimization (src/optimize/bayesian_opt.rs) with a Gaussian-process surrogate, EI/PI/UCB acquisition functions, and Matérn/RBF kernels; and on the Python side, py_minimize now supports a "bfgs" method (numerical gradient via central differences) alongside "nelder-mead". Black-box objectives, gradient-based optimization, and evolutionary search are now all in-house.
Layer 2 — Numerically-stable linear algebra. This is where 0.4.0 quietly fixes things that used to be wrong on real-world sizes. In linalg_stable.rs, svd_bidiagonal now runs a full Golub–Kahan bidiagonalization followed by Jacobi SVD instead of falling back to the n≤3 path — so SVD of large matrices is actually computed, not approximated by a small-matrix shortcut. Companion to it, symmetric_eigen_tridiagonal now runs cyclic Jacobi sweeps (the README banner phrases the eigenpath as “real eigendecomposition via QR iteration with Wilkinson shifts”) instead of the n≤3 fallback. The Schur decomposition replaced a single Rayleigh-shift with Francis implicit double-shift QR plus deflation, producing a correct real Schur form even when complex-conjugate eigenpairs force 2×2 blocks. And the FEM-driven matrix_determinant / matrix_inverse now handle arbitrary n×n via LU with partial pivoting — they previously hard-errored for n>3, which had blocked 3D and higher-order elements outright. These are correctness wins, not just speed: large-matrix decompositions now give the right answer.
Layer 3 — Applied-math breadth. This is the layer that makes 0.4.0 feel like a different library. A non-exhaustive tour:
- Computer vision (
src/new_modules/cv/): Gaussian, bilateral, and median filters; Canny edge detection; Harris and FAST corner detection; morphological operations. - Computational geometry (
src/new_modules/geometry/): convex hull (Graham scan), Delaunay triangulation, Voronoi diagrams, and polygon operations. - FEM solver (
src/new_modules/fem/): 1D and 2D finite elements with Dirichlet and Neumann boundary conditions, Gaussian quadrature, direct and iterative solvers, and 2D point evaluation via element shape functions. - Wavelets (
src/new_modules/wavelets/): DWT and CWT across the Haar, Daubechies (db2–db10), Symlet, and Coiflet families. - Graph algorithms (
src/new_modules/graph/): BFS/DFS, Dijkstra/A*/Floyd-Warshall, MST via Kruskal and Prim, and max-flow via Dinic. - Quantum computing (
src/new_modules/quantum/): gates, circuits, state-vector simulation, measurements, and the canonical algorithms — Deutsch-Jozsa, Grover, and the QFT. - Reinforcement learning (
src/new_modules/rl/): DQN, Actor-Critic, and PPO agents, prioritized replay, and environment wrappers. - Information theory (
src/new_modules/information_theory/): Shannon entropy, mutual information, KL and JS divergence, and channel capacity. - Control systems (
src/new_modules/control/): transfer functions, state-space models, the Routh-Hurwitz criterion, Bode and Nyquist analysis, and PID / LQR / pole-placement design. - Physical constants (
src/new_modules/constants/): CODATA 2018 and 2022 constants with uncertainties, plus unit conversions.
On the econometrics side, VECM now uses full Johansen (1988) cointegration / error-correction estimation in var.rs, replacing a placeholder.
Layer 4 — Reach everywhere. Three additions extend where NumRS2 can run and what it can produce. WebAssembly bindings (src/wasm/) expose array, linalg, stats, and utility surfaces, with a JS/Vite demo in examples/wasm/ — NumRS2 now runs real numerical code in the browser. (For a clean wasm32-unknown-unknown build you disable the gpu and distributed features, which pull in tokio; with those off, the numerics compile and run in a tab.) The distributed framework (src/distributed/) lands the scaffolding for cluster-scale work: data and model parallelism, a coordinator, and distributed optimizers. Be precise about its maturity, because we are: this is a foundation in 0.4.0 — the collective ops (scatter/gather/all-reduce) are still stubbed pending a real network transport, and distributed/linalg.rs returns NotImplemented. The data/model-parallel structure and the distributed optimizers are real and landing now; the wire transport is the clearly-scoped next milestone. Rounding out reach: plotters-based visualization (src/viz/) for 2D/3D, matrix, statistics, and performance plots — pure Rust, no matplotlib — and ONNX-compatible model I/O and serving (src/new_modules/model_io/, src/new_modules/serving/) with an inference engine, a model registry, and real-time monitoring.
Layer 5 — The pure-Rust substrate. None of this leaves the COOLJAPAN sovereignty story. Every scirs2-* dependency — core, stats, linalg, ndimage, spatial, special, fft, and numpy — moves to v0.5.0 in one coordinated step. Linear algebra rides on OxiBLAS (pure-Rust BLAS/LAPACK), serialization on OxiCode (now 0.2.4), and compression on OxiARC (oxiarc-archive and oxiarc-lz4 now 0.3.2) — the pure-Rust archive layer that powers .npz storage and distributed transfer alike. Top to bottom, there is not a line of C or Fortran in the default build.
Getting Started
cargo add numrs2
use numrs2::prelude::*;
fn main() -> Result<()> {
// N-dimensional arrays with NumPy-style broadcasting
let a = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0]).reshape(&[2, 2]);
let b = Array::from_vec(vec![5.0, 6.0, 7.0, 8.0]).reshape(&[2, 2]);
// Element-wise and matrix operations
let c = a.add(&b); // element-wise add
let e = a.matmul(&b)?; // matrix multiply on OxiBLAS
println!("a + b = {}", c);
println!("a @ b = {}", e);
// Full Golub–Kahan SVD — now computed for large matrices, not approximated
let (u, s, vt) = a.svd_compute()?;
println!("singular values = {}", s);
let _ = (u, vt);
// Symmetric eigendecomposition via Wilkinson-shift QR iteration
let sym = Array::from_vec(vec![2.0, 1.0, 1.0, 2.0]).reshape(&[2, 2]);
let (eigvals, _) = sym.eigh("lower")?;
println!("eigenvalues = {}", eigvals);
// Descriptive statistics
let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
println!("mean = {}, std = {}", data.mean()?, data.std()?);
Ok(())
}
The array, linear-algebra, and statistics calls above are the safe, copy-pasteable core. The new 0.4.0 capabilities — autodiff, quantum circuits, FEM, wavelets, the WASM bindings — are reached through their respective modules (numrs2::autodiff, src/new_modules/*); see the Tips below and the per-module examples in the repository.
What’s New in 0.4.0
New modules and capabilities
- Autodiff — forward and reverse mode automatic differentiation with higher-order support: Hessian, Jacobian, hyperdual numbers, and Taylor mode (
src/autodiff/). - Distributed computing — a data/model-parallel framework with collectives, distributed optimizers, and a coordinator (~8,000 lines) (
src/distributed/). Foundation landing now; network transport is next. - WebAssembly bindings — array, linalg, stats, and utility bindings, plus a JS/Vite demo (
src/wasm/,examples/wasm/). - Visualization — plotters-based 2D/3D, matrix, stats, and performance plots, pure Rust (
src/viz/). - Reinforcement learning — DQN, Actor-Critic, PPO, prioritized replay, environment wrappers.
- Quantum computing — gates, circuits, state-vector simulation, measurements, Deutsch-Jozsa, Grover, QFT.
- Model I/O & serving — ONNX-compatible serialization, inference engine, model registry, real-time monitoring.
- CMA-ES optimizer — IPOP-CMA-ES with step-size adaptation and rank-μ / rank-one covariance updates.
- Bayesian optimization — GP surrogate with EI/PI/UCB acquisition and Matérn/RBF kernels.
- Computer vision — Gaussian/bilateral/median filters, Canny edges, Harris/FAST corners, morphology.
- Computational geometry — convex hull (Graham scan), Delaunay triangulation, Voronoi diagrams, polygon ops.
- FEM solver — 1D/2D FEM with Dirichlet/Neumann BCs, Gaussian quadrature, direct/iterative solvers.
- Wavelets — DWT/CWT with Haar, Daubechies (db2–db10), Symlet, and Coiflet families.
- Graph algorithms — BFS/DFS, Dijkstra/A*/Floyd-Warshall, MST (Kruskal/Prim), max-flow (Dinic).
- Information theory — Shannon entropy, mutual information, KL/JS divergence, channel capacity.
- Control systems — transfer functions, state space, Routh-Hurwitz, Bode/Nyquist, PID/LQR/pole placement.
- Physical constants — CODATA 2018/2022 with uncertainties and unit conversions.
New stats / neural-network functions
average_with_weights— NumPyreturned=Truesemantics, returning(weighted_avg, sum_of_weights).skew()andkurtosis()— Fisher-Pearson skewness and excess kurtosis, optional axis, re-exported from the prelude.f_dist()— sample from an F distribution.instance_norm()— instance normalization for 2-D tensors.
Python and econometrics
py_minimizenow supportsmethod="bfgs"(numerical gradient via central differences) in addition to"nelder-mead".- VECM now uses full Johansen (1988) cointegration / error-correction estimation, replacing a placeholder.
Stability and correctness fixes
- Full Golub–Kahan SVD for large matrices (
svd_bidiagonal), no longer falling back to the n≤3 path. - Real symmetric eigendecomposition for large matrices (
symmetric_eigen_tridiagonal) via cyclic Jacobi sweeps / Wilkinson-shift QR. - Schur decomposition rebuilt on Francis implicit double-shift QR with deflation for correct real Schur form.
- Quantum partial trace fixed with correct bit-interleaving index reconstruction (multi-qubit traces now give the right density matrices).
- FEM matrix ops for n>3 —
matrix_determinant/matrix_inversenow use LU with partial pivoting instead of hard-erroring. gamma_lnaccuracy — now usesscirs2_special::loggamma(Lanczos g=7, ~15-digit) with reflection for x<0.5.- Polynomial domain mapping (
polyscale) now does real polynomial composition under the affine map. - Boltzmann exploration now computes a temperature-scaled softmax over Q-values instead of falling back to greedy.
Dependencies
- scirs2-{core,stats,linalg,ndimage,spatial,special,fft,numpy} updated 0.4.2 → 0.5.0.
- oxiarc-archive and oxiarc-lz4 updated 0.2.6 → 0.3.2; oxicode 0.2 → 0.2.4.
Tips
skew()andkurtosis()are in the prelude now. Justuse numrs2::prelude::*;and they are in scope — along with the rest of the array, linalg, and stats surface. No deep module path needed for the common statistics.- Build for the browser with the WASM demo. The
examples/wasm/Vite app shows the bindings in a real browser. When you compile forwasm32-unknown-unknown(viawasm-pack), disable thegpuanddistributedfeatures — they pull in tokio, which the bare-browser target can’t link. With those off, arrays, linalg, and stats run in a tab. - Reach for autodiff instead of finite differences. For gradients and Hessians, use
src/autodiff(forward or reverse mode, plus hyperdual and Taylor for higher order) rather than central differences. You get exact derivatives, not approximations — which also means no step-size tuning and no truncation error. - Use CMA-ES or Bayesian optimization for black-box objectives. When you can’t differentiate the objective, IPOP-CMA-ES (
src/optimize/cma_es/) handles rugged, multimodal landscapes, while Bayesian optimization (src/optimize/bayesian_opt.rs) with EI/PI/UCB is the sample-efficient choice when each evaluation is expensive. - Your
.npzstorage rides on OxiARC 0.3.2. Compressed array I/O now goes through pure-Rust OxiARC (oxiarc-archive + oxiarc-lz4 at 0.3.2) — smaller files on disk, no zlib or native codec linked into your binary. - The Python optimizer accepts
method="bfgs". If you callpy_minimizefrom Python, you now have a quasi-Newton option (numerical gradient via central differences) in addition to"nelder-mead"— better convergence on smooth objectives.
This is the foundation
NumRS2 is the mature, NumPy-class N-dimensional array core at the base of the COOLJAPAN scientific stack, and 0.4.0 establishes it as far more than an array library — it is now a differentiable, distributable, browser-ready numerical platform. It sits directly beneath SciRS2, depending on the full scirs2-* family at v0.5.0 — core, stats, linalg, ndimage, spatial, special, fft, and numpy. Linear algebra rides on OxiBLAS, serialization on OxiCode 0.2.4, and compression on OxiARC 0.3.2 — all pure-Rust COOLJAPAN crates.
By June 2026 the ecosystem around it is broad and deep, and NumRS2 is the array layer the rest build on: OptiRS for optimization and PandRS for dataframes sit alongside it; the ML and applied stack spans ToRSh, sklears, and trustformers; and the systems and acceleration tier reaches from OxiCUDA for GPU compute to OxiMedia and OxiGDAL for media and geospatial. From gradients to clusters to the browser, this release pulls a NumPy-class core into the places NumPy could never reach — without surrendering a single dependency to C or Fortran.
Repository: https://github.com/cool-japan/numrs
Star the repo if you want a NumPy-class numerical foundation that does automatic differentiation, runs in the browser, scales toward the cluster, and ships quantum circuits, FEM solvers, and wavelets — with not a line of C or Fortran in the build.
Pure Rust numerical computing is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ June 5, 2026