COOLJAPAN
← All posts

OptiRS 0.1.0 Released — The First Stable Pure Rust ML Optimizer Suite, Built Exclusively on SciRS2

OptiRS is the Pure Rust ML optimization layer for the COOLJAPAN stack — the torch.optim / optax replacement built exclusively on SciRS2-Core. The 0.1.0 first stable release ships 19 optimizers, SIMD acceleration (2-4x), parallel parameter groups (4-8x), 1,134 passing tests, and zero clippy warnings.

release optirs ml-optimization scirs2 adam sgd pure-rust machine-learning optimizer

The optimization layer of the COOLJAPAN ML stack just reached its first stable release — and it stands on its own, in Pure Rust.

Today we released OptiRS 0.1.0 — a comprehensive, Pure Rust ML optimization library that delivers 19 production-ready optimizers built exclusively as an extension of the SciRS2 scientific computing ecosystem.

No Python. No PyTorch optimizers. No external crates. OptiRS is the torch.optim / optax-class optimizer layer for the COOLJAPAN ML stack, and it follows a strict design rule: it uses only scirs2-core. No direct ndarray. No direct rand. No direct rayon. No direct wide. Arrays, randomness, numerics, SIMD, parallelism, GPU, and metrics all flow through scirs2_core abstractions. The result compiles to a single static binary (or WASM) and runs everywhere — laptops to browsers to edge GPUs to cloud.

This is a humble first release. It is early, but it is solid: real optimizers, real benchmarks, real tests, and a strict architectural foundation we intend to build on for years.

Why OptiRS 0.1.0 matters

If you train models today, you know the incumbent’s pain. The Python optimizer stack carries interpreter overhead and GIL contention. Custom CUDA kernels in C++ are a memory-safety minefield. Vendor lock-in makes portability painful. WASM and embedded targets are barely supported, if at all. And reproducibility is a constant fight against hidden nondeterminism.

OptiRS takes a different path, and 0.1.0 shows the early payoff with concrete numbers:

Memory safety comes from Rust. Portability comes from the single-binary / WASM story. Reproducibility comes from routing all randomness through scirs2_core::random. None of this is bolted on — it falls out of the design.

Technical Deep Dive: the OptiRS architecture

OptiRS 0.1.0 is organized as a workspace of focused crates layered on SciRS2-Core.

Layer 1 — Core optimizers (optirs-core, Production Ready). The heart of the library. 19 optimizers split across first-order and second-order methods:

Layer 2 — Schedules and performance. Five learning-rate schedulers ship in 0.1.0: ExponentialDecay, StepDecay, CosineAnnealing, LinearWarmup, and OneCycle. SIMD vectorization activates automatically above a threshold (16 elements for f32, 8 for f64) for both f32 and f64. The ParallelOptimizer wrapper distributes parameter groups across cores. Memory-efficient features include gradient accumulation for micro-batch training, chunked parameter processing for billion-parameter models, and memory-usage estimation.

Layer 3 — Production tooling (optirs-bench, Production Ready). Real-time performance tracking, gradient statistics (mean / std / norm / sparsity), parameter statistics, and convergence detection with moving averages — all exportable to JSON and CSV.

Layer 4 — The GPU foundation and beyond. The remaining crates establish the long-term surface: optirs-gpu (In Development) is a multi-backend GPU framework — CUDA, Metal, OpenCL, and WebGPU, with context management and tensor-core / mixed-precision support. In 0.1.0 this is a foundation, not yet a production engine. Alongside it sit optirs-tpu (Framework Ready), optirs-learned (Research Phase), and optirs-nas (Research Phase). Six crates in total, with a clear honest status for each.

Throughout, the SciRS2 integration is strict: arrays via scirs2_core::ndarray, randomness via scirs2_core::random, numerics via scirs2_core::numeric, SIMD via scirs2_core::simd_ops, parallelism via scirs2_core::parallel_ops, GPU via scirs2_core::gpu, and metrics via scirs2_core::metrics. Per COOLJAPAN policy, linear algebra runs on oxiblas through scirs2-core (no openblas), and serialization uses oxicode.

Getting Started

Add the optimizer crate and its required foundation:

cargo add optirs-core scirs2-core

Or in Cargo.toml:

[dependencies]
optirs-core = "0.1.0"
scirs2-core = "0.1.1"  # required foundation

A minimal Adam step looks like this:

use optirs_core::optimizers::{Adam, Optimizer};
// ALWAYS use scirs2_core for arrays — NEVER direct ndarray!
use scirs2_core::ndarray::Array1;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let params = Array1::from_vec(vec![1.0, 2.0, 3.0, 4.0]);
    let gradients = Array1::from_vec(vec![0.1, 0.2, 0.15, 0.08]);

    let mut optimizer = Adam::new(0.001);
    let updated_params = optimizer.step(&params, &gradients)?;

    println!("Updated parameters: {:?}", updated_params);
    Ok(())
}

That is the whole contract: build an optimizer, call step with parameters and gradients, get updated parameters back. The same Optimizer trait is shared by all 19 implementations, so swapping Adam for Lion or L-BFGS is a one-line change.

What’s inside

This is the first release, so here is the plain-language inventory of what 0.1.0 actually contains:

Tips

A few concrete ways to get the most out of 0.1.0:

  1. Reach for SimdSGD on big tensors. For large parameter arrays, SimdSGD gives a 2-4x speedup over plain SGD with no code changes beyond the constructor. SIMD kicks in automatically once arrays cross the threshold (16 elements for f32, 8 for f64), so small layers pay nothing.

  2. Wrap heavy models in ParallelOptimizer. When you have many parameter groups, the parallel wrapper spreads updates across cores for a 4-8x win on multi-core machines:

    use optirs_core::optimizers::{Adam, ParallelOptimizer};
    let optimizer = ParallelOptimizer::new(Adam::new(0.001));
    
  3. Use gradient accumulation for micro-batches. If a full batch will not fit in memory, accumulate gradients across micro-batches and step once — combined with chunked parameter processing, this is how you reach billion-parameter scale without exhausting RAM.

  4. Pair a warmup scheduler with AdamW. LinearWarmup into CosineAnnealing (or OneCycle) on top of AdamW’s decoupled weight decay is a strong, well-behaved default for transformer-style training.

  5. Export metrics to watch convergence. Turn on gradient and parameter statistics from optirs-bench, enable convergence detection with moving averages, and dump the results to JSON or CSV to plot training behavior outside the process.

  6. Try sharpness-aware training when you can afford it. SAM trades extra compute for flatter minima and better generalization — a good experiment once your baseline is stable.

This is the foundation

OptiRS is the optimization layer of the COOLJAPAN ML stack, and 0.1.0 launches into a fast-growing family. As of today, its already-shipped siblings are:

OptiRS launches alongside NumRS2 and PandRS, all three resting on the SciRS2 foundation that shipped 12-29. The optimizer layer is now in place.

Repository: https://github.com/cool-japan/optirs

Star the repo if you want Pure Rust ML optimization that you can actually deploy anywhere — and follow along as we grow from this first stable release.

Pure Rust ML optimization — built on SciRS2 — is here: fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ December 30, 2025

↑ Back to all posts