The optimization layer of the COOLJAPAN ML stack just reached its first stable release — and it stands on its own, in Pure Rust.
Today we released OptiRS 0.1.0 — a comprehensive, Pure Rust ML optimization library that delivers 19 production-ready optimizers built exclusively as an extension of the SciRS2 scientific computing ecosystem.
No Python. No PyTorch optimizers. No external crates. OptiRS is the torch.optim / optax-class optimizer layer for the COOLJAPAN ML stack, and it follows a strict design rule: it uses only scirs2-core. No direct ndarray. No direct rand. No direct rayon. No direct wide. Arrays, randomness, numerics, SIMD, parallelism, GPU, and metrics all flow through scirs2_core abstractions. The result compiles to a single static binary (or WASM) and runs everywhere — laptops to browsers to edge GPUs to cloud.
This is a humble first release. It is early, but it is solid: real optimizers, real benchmarks, real tests, and a strict architectural foundation we intend to build on for years.
Why OptiRS 0.1.0 matters
If you train models today, you know the incumbent’s pain. The Python optimizer stack carries interpreter overhead and GIL contention. Custom CUDA kernels in C++ are a memory-safety minefield. Vendor lock-in makes portability painful. WASM and embedded targets are barely supported, if at all. And reproducibility is a constant fight against hidden nondeterminism.
OptiRS takes a different path, and 0.1.0 shows the early payoff with concrete numbers:
- 19 optimizers, from SGD to L-BFGS, all production-ready and all on one foundation.
- SIMD acceleration: 2-4x speedup on large parameter arrays, via
scirs2_core::simd_ops. - Parallel parameter groups: 4-8x speedup across multiple cores, via
scirs2_core::parallel_ops. - Tight benchmarks (Criterion.rs): SGD under 10ns per parameter update, Adam under 50ns, optimizer state under 2x parameter memory.
- 1,134 tests passing (1,061 unit + 73 doc), zero clippy warnings, and 100% public API documentation.
Memory safety comes from Rust. Portability comes from the single-binary / WASM story. Reproducibility comes from routing all randomness through scirs2_core::random. None of this is bolted on — it falls out of the design.
Technical Deep Dive: the OptiRS architecture
OptiRS 0.1.0 is organized as a workspace of focused crates layered on SciRS2-Core.
Layer 1 — Core optimizers (optirs-core, Production Ready). The heart of the library. 19 optimizers split across first-order and second-order methods:
- First-order (17): SGD (momentum + Nesterov), SimdSGD (SIMD-accelerated, 2-4x faster on large arrays), Adam, AdamW (decoupled weight decay), AdaDelta, AdaBound (dynamic bounds transitioning Adam→SGD), RMSprop, Adagrad, LAMB, LARS, Lion (evolved sign momentum), Lookahead (k steps forward, 1 back), RAdam (rectified), Ranger (RAdam+Lookahead), SAM (sharpness-aware minimization), SparseAdam, and GroupedAdam.
- Second-order (2): L-BFGS and Newton-CG (with trust region).
Layer 2 — Schedules and performance. Five learning-rate schedulers ship in 0.1.0: ExponentialDecay, StepDecay, CosineAnnealing, LinearWarmup, and OneCycle. SIMD vectorization activates automatically above a threshold (16 elements for f32, 8 for f64) for both f32 and f64. The ParallelOptimizer wrapper distributes parameter groups across cores. Memory-efficient features include gradient accumulation for micro-batch training, chunked parameter processing for billion-parameter models, and memory-usage estimation.
Layer 3 — Production tooling (optirs-bench, Production Ready). Real-time performance tracking, gradient statistics (mean / std / norm / sparsity), parameter statistics, and convergence detection with moving averages — all exportable to JSON and CSV.
Layer 4 — The GPU foundation and beyond. The remaining crates establish the long-term surface: optirs-gpu (In Development) is a multi-backend GPU framework — CUDA, Metal, OpenCL, and WebGPU, with context management and tensor-core / mixed-precision support. In 0.1.0 this is a foundation, not yet a production engine. Alongside it sit optirs-tpu (Framework Ready), optirs-learned (Research Phase), and optirs-nas (Research Phase). Six crates in total, with a clear honest status for each.
Throughout, the SciRS2 integration is strict: arrays via scirs2_core::ndarray, randomness via scirs2_core::random, numerics via scirs2_core::numeric, SIMD via scirs2_core::simd_ops, parallelism via scirs2_core::parallel_ops, GPU via scirs2_core::gpu, and metrics via scirs2_core::metrics. Per COOLJAPAN policy, linear algebra runs on oxiblas through scirs2-core (no openblas), and serialization uses oxicode.
Getting Started
Add the optimizer crate and its required foundation:
cargo add optirs-core scirs2-core
Or in Cargo.toml:
[dependencies]
optirs-core = "0.1.0"
scirs2-core = "0.1.1" # required foundation
A minimal Adam step looks like this:
use optirs_core::optimizers::{Adam, Optimizer};
// ALWAYS use scirs2_core for arrays — NEVER direct ndarray!
use scirs2_core::ndarray::Array1;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let params = Array1::from_vec(vec![1.0, 2.0, 3.0, 4.0]);
let gradients = Array1::from_vec(vec![0.1, 0.2, 0.15, 0.08]);
let mut optimizer = Adam::new(0.001);
let updated_params = optimizer.step(¶ms, &gradients)?;
println!("Updated parameters: {:?}", updated_params);
Ok(())
}
That is the whole contract: build an optimizer, call step with parameters and gradients, get updated parameters back. The same Optimizer trait is shared by all 19 implementations, so swapping Adam for Lion or L-BFGS is a one-line change.
What’s inside
This is the first release, so here is the plain-language inventory of what 0.1.0 actually contains:
- 19 production-ready optimizers spanning the modern first-order family (SGD through Lion, RAdam, Ranger, SAM) and two second-order methods (L-BFGS, Newton-CG).
- 5 learning-rate schedulers for warmup, decay, and cyclical training.
- SIMD acceleration with automatic f32/f64 vectorization and a threshold-based activation knob.
- Parallel parameter-group processing through the
ParallelOptimizerwrapper. - Memory-efficient training via gradient accumulation, chunked processing, and memory estimation.
- A multi-backend GPU framework foundation (CUDA, Metal, OpenCL, WebGPU).
- Metrics and monitoring with gradient/parameter statistics, convergence detection, and JSON/CSV export.
- Full SciRS2 integration — every numerical primitive routed through
scirs2_core. - 1,134 passing tests, zero clippy warnings, 100% public API docs.
Tips
A few concrete ways to get the most out of 0.1.0:
-
Reach for SimdSGD on big tensors. For large parameter arrays,
SimdSGDgives a 2-4x speedup over plain SGD with no code changes beyond the constructor. SIMD kicks in automatically once arrays cross the threshold (16 elements for f32, 8 for f64), so small layers pay nothing. -
Wrap heavy models in
ParallelOptimizer. When you have many parameter groups, the parallel wrapper spreads updates across cores for a 4-8x win on multi-core machines:use optirs_core::optimizers::{Adam, ParallelOptimizer}; let optimizer = ParallelOptimizer::new(Adam::new(0.001)); -
Use gradient accumulation for micro-batches. If a full batch will not fit in memory, accumulate gradients across micro-batches and step once — combined with chunked parameter processing, this is how you reach billion-parameter scale without exhausting RAM.
-
Pair a warmup scheduler with AdamW.
LinearWarmupintoCosineAnnealing(orOneCycle) on top of AdamW’s decoupled weight decay is a strong, well-behaved default for transformer-style training. -
Export metrics to watch convergence. Turn on gradient and parameter statistics from
optirs-bench, enable convergence detection with moving averages, and dump the results to JSON or CSV to plot training behavior outside the process. -
Try sharpness-aware training when you can afford it.
SAMtrades extra compute for flatter minima and better generalization — a good experiment once your baseline is stable.
This is the foundation
OptiRS is the optimization layer of the COOLJAPAN ML stack, and 0.1.0 launches into a fast-growing family. As of today, its already-shipped siblings are:
- SciRS2 — the scientific computing foundation OptiRS is built on, which shipped the day before (12-29).
- NumRS2 (0.1.0, shipped the same day, 12-30) — Pure Rust numerical arrays.
- PandRS (0.1.0, shipped the same day, 12-30) — Pure Rust dataframes.
- OxiBLAS (12-28) — accelerated linear algebra under SciRS2.
- Oxicode (12-28) — serialization.
- Spintronics, VoiRS, Tenrso, and TensorLogic — the wider COOLJAPAN family.
OptiRS launches alongside NumRS2 and PandRS, all three resting on the SciRS2 foundation that shipped 12-29. The optimizer layer is now in place.
Repository: https://github.com/cool-japan/optirs
Star the repo if you want Pure Rust ML optimization that you can actually deploy anywhere — and follow along as we grow from this first stable release.
Pure Rust ML optimization — built on SciRS2 — is here: fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ December 30, 2025