Drop-in PyTorch replacement in pure Rust. Full SciRS2 integration (18 crates), SIMD CPU backend, autograd, and native sharding support. 2—3× faster inference, 50% less memory, single-binary deployment — no Python, no CUDA required.
The moment the Rust community has been waiting for.
On February 23 we finally released ToRSh 0.1.0 — a complete PyTorch-compatible deep learning framework built entirely in Rust.
This is not another “Rust ML experiment”.
This is the real thing: a production-ready, memory-safe, high-performance alternative that lets you write almost identical PyTorch code… but run it faster, safer, and without any Python runtime.
For years, Rust ML meant choosing between:
ToRSh eliminates both problems.
1. Tensor & Autograd Layer
Write code that looks almost exactly like PyTorch:
let x = tensor![[1.0, 2.0], [3.0, 4.0]].requires_grad();
let y = tensor![[5.0, 6.0], [7.0, 8.0]];
let z = x.matmul(&y)?;
let loss = z.pow(2).sum();
loss.backward()?;
println!("Gradient: {:?}", x.grad());
2. Backend System
3. SciRS2 Integration
ToRSh natively calls SciRS2’s 18 crates, giving you:
4. Sharding (the reason for the name)
Data-parallel and model-parallel sharding were designed in from the beginning. You can scale to thousands of GPUs with pure Rust code.
ToRSh is now the official deep learning engine of the entire COOLJAPAN stack:
Repository: https://github.com/cool-japan/torsh
Star the repo if you want PyTorch-level productivity with Rust-level performance and safety.
The Python monopoly on deep learning is cracking.
ToRSh is the first serious Rust contender — and it’s already faster.
— KitaSan at COOLJAPAN OU
February 23, 2026