Less boilerplate, more model — TenfloweRS 0.1.1 sands down the rough edges of the first stable release and keeps pace with the wgpu 29 GPU stack.
Today we released TenfloweRS 0.1.1 — an ergonomics and compatibility patch that adds a declarative tensor! macro, a much wider prelude, ndarray interop, and full wgpu v29 compatibility on top of the 0.1.0 foundation.
TenfloweRS is the pure-Rust answer to TensorFlow. No C runtime, no CUDA-C kernels, no Python interpreter wedged into your deployment story — the eager and graph execution engines, autodiff, and GPU backends are all Rust, all the way down. 0.1.1 does not change that contract; it simply makes the front door easier to walk through.
Why 0.1.1 matters
The 0.1.0 release was the first stable cut, and shipping it surfaced the friction that only real use exposes. Constructing tensors meant verbose, explicit calls where a literal would have been clearer. The prelude covered the basics but left transformer and recurrent building blocks out of easy reach, so model code reached deep into module paths. The Python FFI flattened rich error types into generic exceptions, throwing away the information a Python user most needs. And wgpu kept evolving underneath us, leaving API drift to paper over.
0.1.1 closes those gaps with concrete wins drawn straight from the changelog: a tensor![] macro for shape-inferred construction, an expanded prelude that surfaces MultiHeadAttention, TransformerEncoder/TransformerDecoder, RMSNorm, GRU, LSTM, and RNN directly, interop::ndarray conversions to bridge existing array code, typed Python exceptions via into_py_err(), and a thorough wgpu v29 migration touching 57 call sites across tenflowers-core and tenflowers-dataset.
What’s New in 0.1.1
Meta-crate ergonomics (tenflowers)
tensor![]declarative macro for shape-inferred tensor creation — 1-D, 2-D, nested, or with an explicit dtype.- Type aliases for readable signatures:
Tensor1D<T>,Tensor2D<T>,Tensor3D<T>,Tensor4D<T>, plusVector,Matrix, andBatchTensor. - Expanded prelude exposing
MultiHeadAttention,RMSNorm,TransformerEncoder,TransformerDecoder,GRU,LSTM,RNN,Optimizer,RandomSampler, andDType. tenflowers::interop::ndarraywithfrom_ndarray/to_ndarrayconversion utilities.tenflowers::iowithsave_tensor/load_tensorconvenience wrappers.tenflowers::onnxONNX re-exports behind theonnxfeature gate.deprecated_use!macro for structured deprecation notices.- Feature presets:
experimental,minimal, andstandard.
Dataset crate (tenflowers-dataset)
PidAdaptiveController— a PID-controlled prefetch depth driven by the cache hit-rate.- Drift metrics: PSI, the KS two-sample statistic, and Jensen-Shannon divergence, surfaced through a
DriftReport. PipelineInspectorwith per-step latency and shape-in/shape-out tracking, reported viaPipelineInspectionReport.SchemaValidator::validate_fullwith per-fieldFieldDiffvariants (TypeMismatch, Widening, MissingRequired, UnexpectedExtra).- A Criterion throughput benchmark harness.
FFI crate (tenflowers-ffi)
- Structured
TensorError → TenflowersErrormapping covering all 23 core variants in an exhaustive match. into_py_err()mapping to typed Python exceptions:ValueError,RuntimeError,IndexError,MemoryError, andNotImplementedError.PyDevice/PyDeviceKindwithDevice.cpu(),Device.gpu(id), andDevice.rocm(id).PyTensor.__repr__dtype fix (it previously hardcoded float32), plus__len__,.ndim, and.numel().- Opt-in cbindgen header regeneration via
build.rs, pytest fixtures intests/conftest.py, anddocs/FFI_ERROR_MAPPING.md.
wgpu v29 compatibility (fix)
- 57 sites updated across
tenflowers-coreandtenflowers-dataset:InstanceDescriptor::default()→InstanceDescriptor::new_without_display_handle(), andbind_group_layouts: &[&layout]→&[Some(&layout)]. - Fixed 9
&str/Stringtype mismatches in the drift-metric constructors indata_quality.rs.
Documentation and tooling
- A Getting Started tutorial and a 35-row PyTorch↔TenfloweRS API mapping table in the README, plus a Mermaid crate-dependency architecture diagram.
docs/QUICK_REFERENCE.md(a 10-section cheat sheet),MIGRATION_FROM_TENSORFLOW.md(five side-by-side TF→Rust scenarios),TROUBLESHOOTING.md(ten symptom-fix triples), andPRELUDE_STABILITY.md(the semver stability policy for the prelude).- Release tooling:
scripts/publish_meta.sh,scripts/bump_version.sh,docs/RELEASE_CHECKLIST.md,scripts/run_miri.sh, anddocs/MEMORY_SAFETY.md. - Expanded autograd docs (~180 lines on mixed precision, checkpointing, higher-order grads, and custom ops), finalized neural docs, and three new autograd examples:
mixed_precision.rs,gradient_checkpointing.rs, andhigher_order_grads.rs.
Under the hood, dependencies moved forward too: rayon 1.11→1.12, the scirs2-core/-autograd/-neural/-linalg/-numpy crates bumped to 0.4.2, wgpu pinned to v29, pyo3 to 0.28, and optirs to 0.3.
Getting Started
Add the meta-crate:
cargo add tenflowers
A quick taste of the 0.1.1 ergonomics — the new tensor! macro, type aliases, an expanded prelude, and ndarray interop, all in a few lines:
use tenflowers::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// New in 0.1.1: declarative, shape-inferred tensor creation
let x: Tensor2D<f32> = tensor![[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]];
println!("x shape: {:?}", x.shape());
// Expanded prelude now exposes transformer + recurrent layers directly
let encoder = TransformerEncoder::new(/* d_model */ 512, /* heads */ 8, /* layers */ 6);
let norm = RMSNorm::new(512);
let _ = (&encoder, &norm); // wired into a model in real use
// ndarray interop (new in 0.1.1)
let nd = x.to_ndarray()?;
let back = Tensor::<f32>::from_ndarray(&nd)?;
println!("round-trip shape: {:?}", back.shape());
Ok(())
}
Tips
- Reach for
tensor![]and the type aliases. A literal liketensor![[1.0, 2.0], [3.0, 4.0]]annotated asTensor2D<f32>reads far better than an explicit shape-plus-data constructor, and the aliases (Vector,Matrix,BatchTensor,Tensor3D,Tensor4D) keep function signatures honest about rank. - Pull layers straight from the prelude.
MultiHeadAttention,TransformerEncoder/TransformerDecoder,RMSNorm,GRU,LSTM, andRNNare now top-level prelude imports — no more digging through module paths to assemble a model. - Bridge existing array code with
interop::ndarray. If you already havendarraydata,Tensor::from_ndarray(&nd)andtensor.to_ndarray()?move you in and out without a manual copy loop. - Turn on ONNX when you need it. Enable the
onnxfeature to unlock thetenflowers::onnxre-exports, then pair them with OxiONNX for import/export. - Choose a feature preset. Start with
minimalfor lean builds,standardfor the everyday surface, orexperimentalwhen you want the bleeding edge — no need to hand-curate feature lists. - Debug pipelines with the new dataset tooling. Wrap a pipeline in
PipelineInspectorto see per-step latency and shapes, and reach for the PSI / KS / Jensen-Shannon drift metrics (viaDriftReport) when training and serving distributions start to diverge. - Map errors cleanly into Python. From the FFI layer,
into_py_err()turns aTenflowersErrorinto the right typed exception (ValueError,IndexError,MemoryError, and friends), so Python callers get actionable errors instead of a generic blob.
A growing foundation
TenfloweRS sits inside the COOLJAPAN ecosystem and leans on it directly: NumRS2 for n-dimensional arrays, SciRS2 (now at 0.4.2) for the scientific core, OptiRS for optimizers, and Oxicode for serialization. The new ONNX re-exports pair naturally with OxiONNX for interchange, and TenfloweRS stands alongside ToRSh, TrustformeRS, and sklears as the COOLJAPAN machine-learning stack — a deep-learning framework next to a tensor library, a transformers library, and classical ML.
Repository: https://github.com/cool-japan/tenflowers
Star the repo if a pure-Rust TensorFlow alternative with a clean, ergonomic prelude is something you want to see grow. Feedback, issues, and pull requests are all welcome — 0.1.1 is shaped by exactly that kind of real-world use.
— KitaSan at COOLJAPAN OÜ April 24, 2026