#deep-learning | COOLJAPAN Blog

May 3, 2026 · 8 min

OxiCUDA 0.1.5 Released — Nine New GPU Deep-Learning Crates (GenAI, GNN, Mamba, ViT, Audio, Time-Series, Bayesian, Federated, NAS)

The pure-Rust NVIDIA CUDA Toolkit replacement adds nine new GPU deep-learning crates — generative diffusion, graph neural nets, Mamba SSMs, vision transformers, audio/speech, time-series, Bayesian DL, federated learning, and NAS — growing to ~320K lines across 37 crates with 9,568 passing tests. No CUDA SDK, no nvcc.

releaseoxicudacuda

Apr 27, 2026 · 9 min

ToRSh 0.1.2 Released — Real AVX2/NEON SIMD and a Zero-Copy Tensor Memory Pool

ToRSh is a pure-Rust, PyTorch-compatible deep-learning framework with native tensor sharding. 0.1.2 lands real AVX2/NEON SIMD for f32 ops and activations, a true zero-copy buffer pool (100% heap-block reduction on hot loops), and SIMD + parallel enabled by default.

releasetorshdeep-learning

Mar 20, 2026 · 6 min

ToRSh 0.1.1 Released — A Stabilized Pure-Rust PyTorch, Now With a Model Converter

ToRSh is a PyTorch-compatible deep-learning framework in pure Rust with native tensor sharding. The 0.1.1 release hardens the 33-crate workspace onto consistent, published crates.io dependencies and adds the new torsh-convert model-converter CLI.

releasetorshdeep-learning

Mar 5, 2026 · 8 min

SciRS2 0.3.0 Released — The Largest Feature Expansion Yet: Modern Deep Learning, Advanced Statistics, and Signal Processing in Pure Rust

SciRS2 is a pure-Rust SciPy/NumPy/scikit-learn replacement, and 0.3.0 is the biggest release yet — transformers, GNNs, diffusion, MoE/RLHF, Gaussian processes, MCMC, survival analysis, radar/compressed sensing, LOBPCG/AMG, plus new Julia and Python bindings. 19,644 tests, ~2.59M lines of Rust. No C, no Fortran.

releasescirs2scientific-computing

Feb 23, 2026 · 2 min

ToRSh 0.1.0 Released — The Pure Rust PyTorch-Compatible Deep Learning Framework with Sharding

Drop-in PyTorch replacement in pure Rust. Full SciRS2 integration (18 crates), SIMD CPU backend, autograd, and native sharding support. 2—3× faster inference, 50% less memory, single-binary deployment — no Python, no CUDA required.

releasetorshpytorch