COOLJAPAN
← All posts

SciRS2 0.5.0 Released — Pure-Rust GPU Acceleration Goes Real (wgpu) Across the Stack

SciRS2 0.5.0 brings real pure-Rust wgpu/WebGPU acceleration — GpuNdarray, GPU graph algorithms, GPU optimizers (L-BFGS/CG/Newton), GPU RBF interpolation — plus correct Pantelides DAE index reduction, high-order SDE Lévy-area, Hilbert curves, and a maturing symbolic CAS. The NumPy/SciPy/scikit-learn replacement, in pure Rust.

release scirs2 gpu webgpu wgpu scientific-computing pure-rust symbolic-math

GPU-accelerated scientific computing, written entirely in Rust — no CUDA C, no vendor toolchain, the same code running native on your laptop and inside a browser tab.

Today we released SciRS2 0.5.0 — GPU acceleration goes real across the workspace via pure-Rust wgpu, alongside serious advanced numerics and a maturing computer algebra system.

No C. No Fortran. No CUDA toolchain. No NumPy/SciPy system dependencies. The headline of 0.5.0 is that the GPU story is now real — and it is real on pure-Rust WebGPU (wgpu), not on NVIDIA’s CUDA C. That distinction matters: because the compute kernels are WGSL running through wgpu, the very same SciRS2 code path runs natively on Linux/macOS/Windows and compiles to WebAssembly to run in the browser via WebGPU. Everything still compiles down to a single static binary (or a WASM module), with graceful CPU fallback when no GPU adapter is present — so it stays pure Rust by default.

This is a confident minor milestone (0.4.x → 0.5.0): roughly 36,082 tests passing across 29 workspace crates, nearly 4 million lines of Rust, 80,800+ public API items, zero warnings (clippy + rustdoc + fmt clean), Apache-2.0.

Why SciRS2 0.5.0 is a game changer

The pain is familiar. NumPy and SciPy are CPU-bound and Python-slow, and the moment you reach for the GPU you inherit the CUDA C toolchain, a driver/version matrix, and vendor lock-in. You write your science in Python, then rewrite the hot loops in C/C++/CUDA, then babysit the build. SciRS2 0.5.0 takes a different path: GPU acceleration that is pure Rust, portable, and browser-ready.

Concrete 0.5.0 wins:

The test counts are the credibility: 13 Pantelides tests, 10 Lévy-area tests, 8 GpuNdarray tests — all green inside that 36,082-test sweep.

Technical Deep Dive: Pure-Rust GPU via wgpu

Layer 1 — GpuNdarray<f32> in scirs2-core. The foundation lives in array_protocol/gpu_ndarray.rs. A single WebGPUContext is initialized lazily through a OnceLock singleton and shared across the workspace. On top of it sit 7 WGSL compute kernels: elementwise add/sub/mul/scalar, a naive matmul, a two-pass parallel sum, and a 16×16 tiled transpose. The 0.5.0 release also adds concat_axis.wgsl (uniform-stride gather for axis > 0) and reduce_sum_axis.wgsl (per-output axis reduction for rank ≥ 3), and fills in 13 WGSL optimizer/integrator kernel slots — Adam/SGD/RMSprop/Adagrad/LAMB, memcpy/fill, reduce_sum/reduce_max, and the RK4 stages. The whole layer is gated behind the array_protocol_wgpu feature, and a public GpuNdarray::matmul() wrapper exposes the matmul kernel to downstream crates.

Layer 2 — GPU across the crates.

Layer 3 — advanced numerics. Beyond the GPU, 0.5.0 lands genuinely hard algorithms. scirs2-integrate replaces the heuristic find_singular_subsets with the full Pantelides machinery — Hopcroft-Karp O(E√V) bipartite matching plus Tarjan iterative SCC — so DAE index reduction is correct, not approximate; and it adds the Wiktorsson 2001 truncated-series Lévy-area in sde/levy_area.rs. scirs2-spatial ships 2D + 3D Hilbert curve sorting, including a 24-state Butz/Hamilton lookup table for the 3D case (hilbert_d2/hilbert_d3, inverse, f64, and hilbert_sort_2d/hilbert_sort_3d, 8 tests). scirs2-core adds NUMA-locality par_map_chunks (Linux pthread affinity pinning, rayon fallback for Darwin/WASM). And the scirs2-cluster 60× win was a real algorithmic fix: the LRSC/SSC timeouts came from a full eigendecomposition inside ADMM, replaced by a sign-aware early-exit power iteration with min-eigenvalue / min-σ² thresholding to skip sub-threshold SVT modes — LRSC went 120s→2s, SSC 120s→33s, all 18 subspace tests green.

Layer 4 — the maturing CAS. The computer algebra system introduced in 0.4.4 keeps growing. scirs2-symbolic adds ALiBi symbolic positional bias (attention/symbolic_alibi.rs: alibi_slope, alibi_bias_expr, alibi_bias_matrix_symbolic, and verify_symbolic_vs_numerical confirming max_diff < 1e-14 against the scirs2-neural baseline). The differential-geometry layer now computes the Riemann tensor R^μ_{νρσ} (4-term formula via symbolic gradients of the Christoffel symbols), the Ricci trace, and a full-n Weyl decomposition, with 10 integration tests spanning Schwarzschild, Minkowski, Bianchi, and the Kretschmann scalar. neural_priors.rs adds discover_series_prior (sliding-window symbolic regression) and series_prior_regularization, with NUMA wire-up for parallel predict. scirs2-neural exposes a SymbolicPriorLoss, and scirs2-autograd lands a correctness repair (a ScalarMulOp added to gradient name-dispatch) plus a published jit_fusion module that extends fusion to matmul epilogues and batched-matmul→reduction.

Throughout, every GPU path is feature-gated with graceful CPU fallback (GpuNotAvailable / NoAdapter) — so SciRS2 stays pure Rust by default and you opt into the GPU only when you want it.

Getting Started

Add the crate:

cargo add scirs2

A minimal GPU example — build a GpuNdarray<f32>, add elementwise on the GPU, and read the result back. It transparently falls back to CPU when no adapter is present:

use scirs2_core::array_protocol::gpu_ndarray::GpuNdarray;

fn main() {
    // Two small f32 arrays uploaded to the GPU (pure-Rust wgpu).
    let a = GpuNdarray::<f32>::from_vec(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
    let b = GpuNdarray::<f32>::from_vec(vec![10.0, 20.0, 30.0, 40.0], vec![2, 2]);

    // Elementwise add runs the WGSL kernel on the GPU,
    // or falls back to CPU gracefully if no adapter is found.
    let sum = a.add(&b);

    // Matmul exercises the naive matmul kernel.
    let prod = a.matmul(&b);

    println!("sum  = {:?}", sum.to_vec());
    println!("prod = {:?}", prod.to_vec());
}

To actually dispatch to the GPU, enable the feature in Cargo.toml:

[dependencies]
scirs2 = { version = "0.5.0", features = ["array_protocol_wgpu"] }

Prefer to stay on the CPU and exercise the new spatial work instead? Sort 3D points along a Hilbert curve for better locality:

use scirs2_spatial::hilbert_sort_3d;

fn main() {
    let mut points = vec![
        [0.10_f64, 0.90, 0.40],
        [0.80, 0.20, 0.95],
        [0.50, 0.50, 0.50],
        [0.05, 0.05, 0.05],
    ];

    // Reorders points by their position along a 3D Hilbert curve
    // (24-state Butz/Hamilton lookup) so spatially-near points
    // end up near each other in memory.
    hilbert_sort_3d(&mut points);

    println!("{:?}", points);
}

What’s New in 0.5.0

GPU / wgpu (pure-Rust WebGPU):

Advanced numerics:

CAS maturation:

Tips

  1. Opt into the GPU per feature. Enable array_protocol_wgpu for the core GpuNdarray, and per-crate features wgpu_rbf (interpolate) and wgpu_kernels (special). Everything falls back to CPU gracefully (GpuNotAvailable / NoAdapter), so it is safe to ship the feature on even where no GPU exists.
  2. GPU only wins above a threshold. WGSL kernels pay a dispatch + upload cost, so small problems are faster on the CPU. Graph algorithms dispatch to the GPU at n_edges ≥ 4096; the optimizers expose gpu_threshold_override so you can tune the crossover for your hardware.
  3. Sort before you search. Run hilbert_sort_2d / hilbert_sort_3d over your points before building a k-d tree or doing nearest-neighbor queries — the improved spatial locality pays off in cache behavior.
  4. Use the new DAE path for stiff systems. For DAEs where the old heuristic mis-detected singular subsets, the new Pantelides index reduction (Hopcroft-Karp + Tarjan) is correct — reach for it when index reduction matters.
  5. For SDEs, use the Lévy-area path. The Wiktorsson Lévy-area gives you strong order 1.5 in the strong general SRK solver — a real accuracy upgrade over lower-order schemes.
  6. Ship to the browser. Because the GPU layer is wgpu/WebGPU, the same code compiles to wasm32 and runs on WebGPU in the browser — no separate kernel rewrite.

Part of the COOLJAPAN ecosystem

SciRS2 0.5.0 is pure-Rust scientific computing, and 0.5.0 makes its place in the ecosystem clearer than ever:

Repository: https://github.com/cool-japan/scirs

Star the repo if you want NumPy/SciPy/scikit-learn-grade scientific computing without the C, Fortran, or CUDA toolchain.

Pure Rust scientific computing — now GPU-accelerated, browser-ready, and sovereign.

KitaSan at COOLJAPAN OÜ June 3, 2026

↑ Back to all posts