COOLJAPAN
2026-02-23

ToRSh 0.1.0 Released — The Pure Rust PyTorch-Compatible Deep Learning Framework with Sharding

Drop-in PyTorch replacement in pure Rust. Full SciRS2 integration (18 crates), SIMD CPU backend, autograd, and native sharding support. 2—3× faster inference, 50% less memory, single-binary deployment — no Python, no CUDA required.

The moment the Rust community has been waiting for.

On February 23 we finally released ToRSh 0.1.0 — a complete PyTorch-compatible deep learning framework built entirely in Rust.

This is not another “Rust ML experiment”.
This is the real thing: a production-ready, memory-safe, high-performance alternative that lets you write almost identical PyTorch code… but run it faster, safer, and without any Python runtime.

Why ToRSh changes the game

For years, Rust ML meant choosing between:

ToRSh eliminates both problems.

Technical Deep Dive

1. Tensor & Autograd Layer
Write code that looks almost exactly like PyTorch:

let x = tensor![[1.0, 2.0], [3.0, 4.0]].requires_grad();
let y = tensor![[5.0, 6.0], [7.0, 8.0]];
let z = x.matmul(&y)?;
let loss = z.pow(2).sum();
loss.backward()?;
println!("Gradient: {:?}", x.grad());

2. Backend System

3. SciRS2 Integration
ToRSh natively calls SciRS2’s 18 crates, giving you:

4. Sharding (the reason for the name)
Data-parallel and model-parallel sharding were designed in from the beginning. You can scale to thousands of GPUs with pure Rust code.

Benchmarks (Apple M2 Pro, 1000 iterations)

This is the foundation

ToRSh is now the official deep learning engine of the entire COOLJAPAN stack:

Repository: https://github.com/cool-japan/torsh

Star the repo if you want PyTorch-level productivity with Rust-level performance and safety.

The Python monopoly on deep learning is cracking.
ToRSh is the first serious Rust contender — and it’s already faster.

KitaSan at COOLJAPAN OU
February 23, 2026