ToRSh 0.1.1 Released — A Stabilized Pure-Rust PyTorch, Now With a Model Converter

A deep-learning framework you can actually pin, build on, and ship — in pure Rust.

Today we released ToRSh 0.1.1 — a stabilization and dependency-refresh release on top of last month’s 0.1.0 debut that locks the entire 33-crate workspace onto consistent, published crates.io dependency versions and ships the brand-new torsh-convert model-converter command-line tool.

ToRSh — Tensor Operations in Rust with Sharding — is a PyTorch-compatible deep-learning framework built entirely in pure Rust, with native tensor sharding. No C. No C++. No Fortran. No Python runtime. Where PyTorch drags in libtorch and the ATen C++ core, a CUDA toolchain, and a Python interpreter just to multiply two tensors, ToRSh compiles to a single static binary you can drop onto a machine and run. That is the sovereignty angle: your training stack is yours, top to bottom, with nothing foreign in the default build.

Why 0.1.1 matters

A debut release tells you a project exists. A stabilization release tells you that you can build a roadmap on it. That is what 0.1.1 is for.

Everything is pinned and published. Every one of the 33 workspace crates now resolves to a consistent, published crates.io dependency version — no local path patches, no “works on my machine” graph. You can add ToRSh to a real project and get a reproducible build.
A clean, warning-free build across the whole workspace. The 0.1.0 base reported 9,000+ tests passing across the workspace with zero warnings and zero errors; 0.1.1 keeps that bar and hardens it.
A rock-solid foundation, not a moving target. Nothing in the public API was churned to chase this release — the value is consistency, so the layers underneath you stop shifting.
A new tool in the box: torsh-convert. The one genuinely new user-facing artifact in 0.1.1 is a model-converter CLI that moves trained ToRSh models between serialization formats, with batch processing, async I/O, and progress reporting (more below).

If you were waiting for a version you could depend on before investing, this is it.

Technical Deep Dive: the layers underneath

ToRSh is a layered workspace, and 0.1.1 stabilizes every layer at once. From the bottom up:

(a) The tensor + autograd core. torsh-tensor provides roughly 400 PyTorch-compatible tensor operations — arithmetic, matmul, reductions, advanced indexing, broadcasting, FFT, complex numbers, sorting, histograms, the complete PyTorch scatter family, and in-place ops with autograd safety. torsh-autograd sits on top with complete reverse-mode automatic differentiation: computation-graph tracking, higher-order derivatives, and gradient checkpointing. This is the part everything else is written against.

(b) The nn / optim / data training stack. torsh-nn carries the full neural-network layer set — Linear, Conv1d/2d/3d, ConvTranspose, BatchNorm/LayerNorm/GroupNorm/InstanceNorm, RNN/LSTM/GRU, Transformer, MultiheadAttention, the usual activations and pooling. torsh-optim brings 70+ optimizers (SGD, Adam, AdamW, AdaGrad, RMSprop, LBFGS) plus LR schedulers like CosineAnnealing and OneCycle, including advanced optimizers pulled from OptiRS. torsh-data rounds it out with a multi-worker parallel DataLoader, Dataset abstractions, and sampling strategies.

(c) The SciRS2 scientific layer. ToRSh is not just a clone of the PyTorch surface — it reaches into the SciRS2 scientific-computing platform for things PyTorch leaves to third parties. torsh-graph adds graph neural networks (GCN, GAT, GraphSAGE); torsh-series adds time-series analysis (STL decomposition, SSA, Kalman filters); torsh-vision adds computer-vision spatial ops; torsh-sparse and torsh-special add sparse tensors and special functions. Numerical work routes through scirs2-core rather than raw ndarray, and linear algebra runs on the pure-Rust OxiBLAS backend through scirs2-linalg.

(d) Backends and tooling. torsh-backend provides CPU, CUDA, and Metal backends; torsh-jit does JIT compilation via Cranelift with kernel fusion; INT8 quantization and QAT live alongside a model hub and distributed training (DDP/FSDP, collective ops). And new in 0.1.1, tools/torsh-convert joins the tooling tier as a standalone converter for trained model files.

Getting Started

Add ToRSh to your project:

cargo add torsh

A tiny tensor-and-autograd snippet using the prelude:

use torsh::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Two small matrices; track gradients on the input.
    let x = tensor![[1.0, 2.0], [3.0, 4.0]].requires_grad();
    let y = tensor![[0.5, 0.0], [0.0, 0.5]];

    // Forward pass: matmul + ReLU, then a scalar loss.
    let z = x.matmul(&y)?.relu()?;
    let loss = z.sum()?;

    // Reverse-mode AD fills in the gradients.
    loss.backward()?;
    println!("grad = {:?}", x.grad());

    Ok(())
}

And the new converter — install it and migrate a trained model to JSON:

cargo install torsh-convert
torsh-convert model.torsh --output model.json

torsh-convert reads and writes JSON/TOML model metadata, processes whole directories via glob patterns, streams large model files asynchronously on Tokio, and shows visual progress as it goes.

What’s New in 0.1.1

Version bump to 0.1.1 across the entire workspace.
Every workspace crate moved to consistent, published crates.io dependency versions — no local path patches.
New torsh-convert model-converter CLI: batch glob conversion, JSON/TOML metadata read/write, Tokio async I/O for large files, progress indicators, and structured logging via env_logger.
Dependency refresh across the whole tree.

Tips

Pin all torsh-* crates to 0.1.1 together. The whole point of this release is a coherent graph; mixing versions defeats it. Keep them in lockstep for a reproducible build.
Enable only the feature flags you need on the torsh umbrella crate (nn / optim / data / vision / jit and friends) to cut compile time — you rarely need every layer at once.
Batch-migrate model formats in one shot with the new converter:
```
torsh-convert ./models/*.torsh --output-dir ./converted/
```
Reach for torsh-graph or torsh-series when your problem needs GNNs or time-series analysis alongside a conventional model — they live in the same workspace and speak the same tensors.
No system BLAS to install. Linear algebra runs on the pure-Rust OxiBLAS backend through scirs2-linalg, so there is nothing to apt install and nothing to link against.

Part of the COOLJAPAN ecosystem

ToRSh does not stand alone. It is built on top of SciRS2 (the scientific-computing platform that supplies scirs2-core and roughly twenty scirs2-* crates), OxiBLAS (the optimized pure-Rust BLAS/LAPACK backend, pulled in through scirs2-core’s oxiblas features), OxiCode (modern binary serialization, the bincode successor), and OptiRS (advanced ML optimizers). Every one of these shipped before this release, and 0.1.1 stabilizes ToRSh squarely on top of them.

Repository: https://github.com/cool-japan/torsh

Star the repo if a pure-Rust, dependency-pinned PyTorch alternative is something you want to build on — it genuinely helps others find the project.

Pure Rust deep learning is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ March 20, 2026