SciRS2 0.4.2 Released — Neural Architecture Search, CMA-ES, Mamba SSM, Async GPU & Apache Iceberg: The Biggest 0.4.x Drop Yet

AutoML, GPU memory, and data-lake IO — all in pure Rust, no CUDA toolkit, no JVM, no NumPy system stack.

Today we released SciRS2 0.4.2 — the largest 0.4.x feature drop yet, landing Neural Architecture Search, CMA-ES, Mamba state-space models, async/unified GPU memory, H-matrix compression, streaming FFT, DLPack zero-copy interop, and Apache Iceberg/DataFusion IO across the workspace.

No C. No Fortran. No NumPy system dependencies. SciRS2 is a pure-Rust scientific computing and AI library that replaces the entire C/Fortran NumPy/SciPy/scikit-learn stack — OpenBLAS, LAPACK, MKL, and friends — with OxiBLAS, OxiFFT, OxiCode, and the oxiarc-* compression family underneath. The result compiles to a single static binary (or a WASM module) and runs everywhere: laptops, servers, browsers, and GPUs.

Why SciRS2 0.4.2 is a game changer

The incumbent story for modern numerical AI is painful. Python AutoML and Neural Architecture Search frameworks are heavy and CUDA-bound; GPU memory management is fragile and full of footguns; and the moment you want data-lake IO, you are dragged into the JVM and a Spark cluster. SciRS2 0.4.2 dismantles all three problems in one release:

Neural Architecture Search — GDAS, SNAS, and predictor-based NAS in scirs2-optimize, plus NAS repair in scirs2-neural (74 tests). Search architectures without ever leaving Rust or touching a CUDA toolkit.
CMA-ES — a full Covariance Matrix Adaptation Evolution Strategy optimizer (10 tests) for noisy, black-box, and derivative-free objectives.
Mamba state-space model (SSM) — verified Mamba SSM in scirs2-neural, the modern selective-state-space alternative to attention.
Async GPU + unified memory — scirs2-core async GPU memory transfer, a unified memory manager, a stream allocator, NUMA bandwidth optimization, and Metal GPU batch-dispatch fixes that remove every .expect() from the GPU backends.
H-matrix compression + GPU eigensolvers + auto-precision dispatch — scirs2-linalg H-matrix hierarchical compression, GPU eigensolvers, and a mixed CPU/GPU linear solver that picks precision automatically.
Apache Iceberg + DataFusion + object store — scirs2-io gains the Iceberg table format, a DataFusion query provider, vectorized expression evaluation, and an object-store abstraction over S3 / GCS / Azure — no JVM in sight.
DLPack zero-copy + masked arrays — scirs2-numpy now speaks the DLPack protocol for zero-copy exchange with NumPy and PyTorch, plus masked arrays and structured dtypes.
Streaming / out-of-core FFT — scirs2-fft ring-buffer streaming STFT, cache-oblivious FFT, and out-of-core transforms for data that does not fit in memory.

This sits on real scale: 2.94M SLoC, 27,632 tests across 29 workspace crates, 80,800+ public API items, only 19 stubs remaining, and zero warnings (0 errors, 0 clippy, 0 rustdoc).

Technical Deep Dive

SciRS2 is organized in layers, and 0.4.2 advances every one of them.

Core & GPU (scirs2-core). This release adds async GPU memory transfer and a unified memory manager, a stream allocator with memory defragmentation, NUMA bandwidth optimization, an RRB-tree persistent data structure, and Tracy profiler integration. The Metal backend was hardened by removing all .expect() calls and fixing batch dispatch — GPU code is now expect()-free end to end.

Numerics. scirs2-linalg gains H-matrix hierarchical compression, GPU eigensolvers, and auto-precision dispatch for mixed CPU/GPU linear solves. scirs2-special adds f16 mixed-precision paths, GPU auto-dispatch, Hecke and elliptic L-functions, ball arithmetic, connection formulas, Clebsch-Gordan coefficients for SU(2)/SU(3)/SO(5), Hall polynomials, and spheroidal + Mathieu-Hill solvers. scirs2-fft adds streaming STFT, cache-oblivious FFT, and out-of-core transforms. scirs2-sparse adds ILU(0) mixed CPU/GPU preconditioning.

ML & AutoML. scirs2-optimize lands GDAS/SNAS/predictor-based NAS, the CMA-ES optimizer, and subspace embedding (Johnson-Lindenstrauss / Gaussian / sparse, with sketched least-squares). scirs2-neural adds NAS repair and a verified Mamba SSM. scirs2-text ships the Universal Sentence Encoder (USE), SimCSE contrastive embeddings, an HDP topic model, a Unicode tokenizer, and an enhanced BPE tokenizer with chat templates.

Data & Interop. scirs2-io adds the Apache Iceberg table format, a DataFusion query provider, vectorized expression eval and join support, plus an object-store abstraction with S3 multipart upload, GCS, Azure SAS tokens, adaptive compression, and exactly-once delivery. scirs2-numpy adds the DLPack protocol for zero-copy exchange, masked arrays, structured dtypes, and PyUntypedArray. scirs2-datasets becomes HuggingFace-compatible with sharding and ndarray generators (493 lib tests).

Getting Started

Add SciRS2 to your project:

cargo add scirs2

use scirs2::prelude::*;
use ndarray::Array2;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let a = Array2::from_shape_vec((3, 3), vec![
        1.0, 2.0, 3.0,
        4.0, 5.0, 6.0,
        7.0, 8.0, 9.0,
    ])?;

    // OxiBLAS-backed SVD with auto-precision dispatch
    let (_u, s, _vt) = scirs2::linalg::decomposition::svd(&a)?;
    println!("Singular values: {:.4?}", s);
    Ok(())
}

Working from Python? The bindings ship too:

pip install scirs2

scirs2-numpy now supports the DLPack protocol for zero-copy array exchange with NumPy and PyTorch, plus masked arrays and structured dtypes — hand tensors back and forth without a single copy.

What’s New in 0.4.2

Grouped across Waves 40–45:

AutoML & optimization — GDAS/SNAS/predictor-based Neural Architecture Search, NAS repair (74 tests), CMA-ES optimizer, subspace embedding with sketched least-squares.
GPU & memory — async GPU memory transfer, unified memory manager, stream allocator + defragmentation, NUMA bandwidth optimization, Metal GPU fixes (no .expect()), GPU-accelerated spectrograms, GPU Lattice-Boltzmann, RRB-tree, Tracy profiling.
Numerics — H-matrix hierarchical compression, GPU eigensolvers, auto-precision dispatch, ILU(0) mixed CPU/GPU preconditioning, streaming/cache-oblivious/out-of-core FFT, f16 mixed-precision special functions, Hecke/elliptic L-functions, Clebsch-Gordan SU(2)/SU(3)/SO(5).
IO & data — Apache Iceberg table format, DataFusion query provider, vectorized expression eval + joins, object store over S3/GCS/Azure with multipart upload and exactly-once delivery, HuggingFace-compatible datasets.
NLP & embeddings — Universal Sentence Encoder, SimCSE contrastive embeddings, HDP topic model, Unicode tokenizer, enhanced BPE with chat templates, multilingual sentence embeddings.
Interop — DLPack zero-copy array exchange, masked arrays, structured dtypes, PyUntypedArray, expanded Python bindings for special/interpolate/integrate.

Quality Gate: cargo check --workspace --all-features passes with 0 errors and 0 warnings; cargo nextest reports 27,139 passed / 195 skipped (excluding python/datasets), scirs2-datasets --lib adds 493 passed — 27,632 tests passing total; no-unwrap policy PASS. Dependencies bumped: oxifft 0.1.4, sha2 0.11, egui/eframe 0.34.

Tips

Search architectures without leaving Rust. Reach for scirs2-optimize NAS (GDAS / SNAS / predictor) when you want AutoML without a Python framework or a CUDA toolkit; pair it with scirs2-neural NAS repair to fix up invalid candidates.
Use CMA-ES for the hard objectives. When gradients are unavailable or your objective is noisy and black-box, CMA-ES is the right tool — it shines exactly where line-search optimizers stall.
Turn on GPU features for async + unified memory. Enable the GPU features for async transfer and the unified memory manager. On Apple silicon, the Metal backend is now expect()-free, so GPU paths fail gracefully instead of panicking.
Query Iceberg tables in process. Point the scirs2-io DataFusion query provider at an Iceberg table and run SQL directly — no Spark cluster, no JVM.
Hand tensors to NumPy/PyTorch for free. Use scirs2-numpy DLPack for zero-copy hand-off across the Python boundary instead of serializing arrays.
Pick H-matrix compression for big structured systems. For large, low-rank-structured linear systems, scirs2-linalg H-matrix compression cuts both memory and solve time versus dense methods.
Drop-in upgrade. scirs2 = "0.4.2" is a drop-in over 0.4.1 — bump the version and rebuild.

This is the foundation

As of April 2026, SciRS2 is the numeric, scientific, and AutoML bedrock of the COOLJAPAN ecosystem. SkleaRS leans on it for classical machine learning, TenfloweRS and TrustformeRS and ToRSh for deep learning and transformers, OxiONNX for model interchange, and OxiWhisper for speech — all standing on SciRS2’s pure-Rust linear algebra, FFT, optimization, and special functions. It also underpins the freshly shipped OxiPhysics. Every layer above gets the same guarantee: no C, no Fortran, no system BLAS.

Repository: https://github.com/cool-japan/scirs

Star the repo if you want a future where scientific computing and AI are fast, safe, and sovereign. Pure Rust scientific computing and AI is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ April 12, 2026