COOLJAPAN
← All posts

TrustformeRS 0.1.4 Released — Pure-Rust CUDA Replaces cudarc, Verified on Real NVIDIA Hardware

TrustformeRS 0.1.4 migrates CUDA and Metal from cudarc/scirs2-MPS to the Pure-Rust oxicuda stack, passes 12/12 CPU↔CUDA parity tests on a real RTX A4000, ships real PyRwkvModel/PyMambaModel classes on PyO3 0.28, and goes unwrap()-free workspace-wide. The sovereign transformer layer for the COOLJAPAN ecosystem.

release trustformers rust transformers cuda gpu pure-rust metal machine-learning

Every previous CUDA build trusted cudarc’s FFI layer to talk to the driver correctly. As of 0.1.4, TrustformeRS proves its own GPU math instead — on real silicon.

On July 2 we released TrustformeRS 0.1.4 — a release that migrates the CUDA backend from cudarc to the Pure-Rust oxicuda stack, moves Metal compute onto oxicuda-metal, and verifies both against real hardware with golden-parity tests instead of CPU-only approximations.

No C. No Fortran. No cudarc FFI surface sitting between your model and the GPU driver, and no scirs2-core MPS dependency behind Metal anymore either. TrustformeRS’s GPU path now runs on oxicuda-blas, oxicuda-dnn, oxicuda-memory, and oxicuda-driver for CUDA, and oxicuda-metal/oxicuda-backend for Apple Silicon — both Pure Rust, both checked against real devices. TrustformeRS compiles to a single static binary — or to WASM, or onto mobile — and runs anywhere Rust runs.

Why TrustformeRS 0.1.4 is a game changer

The incumbent path to GPU-accelerated transformers looks like this:

TrustformeRS 0.1.4 ends all of that:

Technical Deep Dive

1. trustformers-core — the GPU backend layer. The cuda feature now pulls oxicuda-blas/-dnn/-memory/-driver; cuda-oxicuda remains only as a deprecated alias. The metal feature pulls oxicuda-metal/oxicuda-backend alongside objc2. Both are optional, both are Pure Rust, and the CPU reference kernel they’re checked against — kernels/rope.rs, GPT-NeoX half-split convention — is now the only RoPE implementation in the tree: 0.1.4 also deleted an orphaned, never-mounted ~1,693-line rope/mod.rs that used an inconsistent convention.

2. trustformers-models — 49+ architectures, real GPU wiring on two of them. GPU-resident forward passes are wired end-to-end for GPT-2 and RetNet today; the rest still run CPU f32 while broader coverage lands. Enabling cuda or metal on trustformers-core now propagates through trustformers-models and the trustformers umbrella crate.

3. trustformers-wasm — WebGPU gets a real device. The WebAssembly compute backend now performs real navigator.gpu → adapter → device initialization, falling back to CPU when no adapter is present, instead of stopping short of an actual device handle.

4. trustformers-serve and trustformers-py — the edges. gRPC serving is back (tonic 0.14’s split tonic-build/tonic-prost-build API), and trustformers-serve remains the one crate keeping non-Pure-Rust TLS (rustls/aws-lc-rs) as an accepted exception — its lambda and swagger-ui adapters are opt-in features, off by default. On the Python side, bindings were re-modernized onto PyO3 0.28, and PyRwkvModel/PyMambaModel are real classes now, not stand-ins.

Getting Started

cargo add trustformers
use trustformers::prelude::*;
use trustformers::{AutoModel, AutoTokenizer, Tensor};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model and tokenizer
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;

    // Tokenize input
    let tokenized = tokenizer.encode("Hello, Rust world!")?;

    // AutoModel's `Model` impl is Tensor-in/Tensor-out, so wrap the token IDs
    // as a Tensor before running inference.
    let ids: Vec<f32> = tokenized.input_ids.iter().map(|&id| id as f32).collect();
    let len = ids.len();
    let inputs = Tensor::from_vec(ids, &[len])?;

    let outputs = model.forward(inputs)?;
    println!("Output shape: {:?}", outputs.shape());
    Ok(())
}

To run that same forward pass on a GPU today, reach for a model with real device wiring (GPT-2 or RetNet) and move its weights over explicitly:

// Cargo.toml: trustformers-core = { version = "0.1", features = ["metal"] }  // or "cuda"
use trustformers_core::Device;
use trustformers_models::gpt2::{Gpt2Config, Gpt2Model};

let device = Device::Metal(0); // or Device::CUDA(0)
let mut model = Gpt2Model::new_with_device(Gpt2Config::default(), device)?;
model.weights_to_gpu(&device)?;        // Metal (use `weights_to_gpu_cuda` on CUDA)
let outputs = model.forward(inputs)?;  // attention + linear run on-device, oxicuda-backed

What’s New in 0.1.4

Added

Changed

Removed

Fixed

Tips

This is the foundation

TrustformeRS 0.1.4 landed a day after OxiCUDA 0.4.0 and SciRS2 0.6.0 — and picks up both immediately: oxicuda-blas/-dnn/-memory/-driver/-metal/-backend for GPU compute, scirs2-core/scirs2-linalg 0.6.0 for numerics and linear algebra. Underneath that, OxiBLAS provides Pure-Rust BLAS/LAPACK, OxiCode handles serialization, OxiARC (oxiarc-archive/-deflate/-lz4/-zstd) backs compression, and oxisql-sqlite-compat provides the Pure-Rust SQLite-compatible export backend. It sits beside ToRSh, SkleaRS, and the rest of the COOLJAPAN model-training and serving stack.

Repository: https://github.com/cool-japan/trustformers

Star the repo if you want GPU-accelerated transformers whose CUDA and Metal paths you can actually read, all the way down to the kernel.

The era of trusting an opaque CUDA FFI wrapper is over. Pure Rust GPU-accelerated transformers are here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ July 2, 2026

↑ Back to all posts