TenfloweRS 0.1.1 Released — Ergonomic Prelude, tensor! Macro, and wgpu 29

Less boilerplate, more model — TenfloweRS 0.1.1 sands down the rough edges of the first stable release and keeps pace with the wgpu 29 GPU stack.

Today we released TenfloweRS 0.1.1 — an ergonomics and compatibility patch that adds a declarative tensor! macro, a much wider prelude, ndarray interop, and full wgpu v29 compatibility on top of the 0.1.0 foundation.

TenfloweRS is the pure-Rust answer to TensorFlow. No C runtime, no CUDA-C kernels, no Python interpreter wedged into your deployment story — the eager and graph execution engines, autodiff, and GPU backends are all Rust, all the way down. 0.1.1 does not change that contract; it simply makes the front door easier to walk through.

Why 0.1.1 matters

The 0.1.0 release was the first stable cut, and shipping it surfaced the friction that only real use exposes. Constructing tensors meant verbose, explicit calls where a literal would have been clearer. The prelude covered the basics but left transformer and recurrent building blocks out of easy reach, so model code reached deep into module paths. The Python FFI flattened rich error types into generic exceptions, throwing away the information a Python user most needs. And wgpu kept evolving underneath us, leaving API drift to paper over.

0.1.1 closes those gaps with concrete wins drawn straight from the changelog: a tensor![] macro for shape-inferred construction, an expanded prelude that surfaces MultiHeadAttention, TransformerEncoder/TransformerDecoder, RMSNorm, GRU, LSTM, and RNN directly, interop::ndarray conversions to bridge existing array code, typed Python exceptions via into_py_err(), and a thorough wgpu v29 migration touching 57 call sites across tenflowers-core and tenflowers-dataset.

What’s New in 0.1.1

Meta-crate ergonomics (tenflowers)

tensor![] declarative macro for shape-inferred tensor creation — 1-D, 2-D, nested, or with an explicit dtype.
Type aliases for readable signatures: Tensor1D<T>, Tensor2D<T>, Tensor3D<T>, Tensor4D<T>, plus Vector, Matrix, and BatchTensor.
Expanded prelude exposing MultiHeadAttention, RMSNorm, TransformerEncoder, TransformerDecoder, GRU, LSTM, RNN, Optimizer, RandomSampler, and DType.
tenflowers::interop::ndarray with from_ndarray / to_ndarray conversion utilities.
tenflowers::io with save_tensor / load_tensor convenience wrappers.
tenflowers::onnx ONNX re-exports behind the onnx feature gate.
deprecated_use! macro for structured deprecation notices.
Feature presets: experimental, minimal, and standard.

Dataset crate (tenflowers-dataset)

PidAdaptiveController — a PID-controlled prefetch depth driven by the cache hit-rate.
Drift metrics: PSI, the KS two-sample statistic, and Jensen-Shannon divergence, surfaced through a DriftReport.
PipelineInspector with per-step latency and shape-in/shape-out tracking, reported via PipelineInspectionReport.
SchemaValidator::validate_full with per-field FieldDiff variants (TypeMismatch, Widening, MissingRequired, UnexpectedExtra).
A Criterion throughput benchmark harness.

FFI crate (tenflowers-ffi)

Structured TensorError → TenflowersError mapping covering all 23 core variants in an exhaustive match.
into_py_err() mapping to typed Python exceptions: ValueError, RuntimeError, IndexError, MemoryError, and NotImplementedError.
PyDevice / PyDeviceKind with Device.cpu(), Device.gpu(id), and Device.rocm(id).
PyTensor.__repr__ dtype fix (it previously hardcoded float32), plus __len__, .ndim, and .numel().
Opt-in cbindgen header regeneration via build.rs, pytest fixtures in tests/conftest.py, and docs/FFI_ERROR_MAPPING.md.

wgpu v29 compatibility (fix)

57 sites updated across tenflowers-core and tenflowers-dataset: InstanceDescriptor::default() → InstanceDescriptor::new_without_display_handle(), and bind_group_layouts: &[&layout] → &[Some(&layout)].
Fixed 9 &str/String type mismatches in the drift-metric constructors in data_quality.rs.

Documentation and tooling

A Getting Started tutorial and a 35-row PyTorch↔TenfloweRS API mapping table in the README, plus a Mermaid crate-dependency architecture diagram.
docs/QUICK_REFERENCE.md (a 10-section cheat sheet), MIGRATION_FROM_TENSORFLOW.md (five side-by-side TF→Rust scenarios), TROUBLESHOOTING.md (ten symptom-fix triples), and PRELUDE_STABILITY.md (the semver stability policy for the prelude).
Release tooling: scripts/publish_meta.sh, scripts/bump_version.sh, docs/RELEASE_CHECKLIST.md, scripts/run_miri.sh, and docs/MEMORY_SAFETY.md.
Expanded autograd docs (~180 lines on mixed precision, checkpointing, higher-order grads, and custom ops), finalized neural docs, and three new autograd examples: mixed_precision.rs, gradient_checkpointing.rs, and higher_order_grads.rs.

Under the hood, dependencies moved forward too: rayon 1.11→1.12, the scirs2-core/-autograd/-neural/-linalg/-numpy crates bumped to 0.4.2, wgpu pinned to v29, pyo3 to 0.28, and optirs to 0.3.

Getting Started

Add the meta-crate:

cargo add tenflowers

A quick taste of the 0.1.1 ergonomics — the new tensor! macro, type aliases, an expanded prelude, and ndarray interop, all in a few lines:

use tenflowers::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // New in 0.1.1: declarative, shape-inferred tensor creation
    let x: Tensor2D<f32> = tensor![[1.0, 2.0, 3.0],
                                   [4.0, 5.0, 6.0]];
    println!("x shape: {:?}", x.shape());

    // Expanded prelude now exposes transformer + recurrent layers directly
    let encoder = TransformerEncoder::new(/* d_model */ 512, /* heads */ 8, /* layers */ 6);
    let norm = RMSNorm::new(512);
    let _ = (&encoder, &norm); // wired into a model in real use

    // ndarray interop (new in 0.1.1)
    let nd = x.to_ndarray()?;
    let back = Tensor::<f32>::from_ndarray(&nd)?;
    println!("round-trip shape: {:?}", back.shape());
    Ok(())
}

Tips

Reach for tensor![] and the type aliases. A literal like tensor![[1.0, 2.0], [3.0, 4.0]] annotated as Tensor2D<f32> reads far better than an explicit shape-plus-data constructor, and the aliases (Vector, Matrix, BatchTensor, Tensor3D, Tensor4D) keep function signatures honest about rank.
Pull layers straight from the prelude. MultiHeadAttention, TransformerEncoder/TransformerDecoder, RMSNorm, GRU, LSTM, and RNN are now top-level prelude imports — no more digging through module paths to assemble a model.
Bridge existing array code with interop::ndarray. If you already have ndarray data, Tensor::from_ndarray(&nd) and tensor.to_ndarray()? move you in and out without a manual copy loop.
Turn on ONNX when you need it. Enable the onnx feature to unlock the tenflowers::onnx re-exports, then pair them with OxiONNX for import/export.
Choose a feature preset. Start with minimal for lean builds, standard for the everyday surface, or experimental when you want the bleeding edge — no need to hand-curate feature lists.
Debug pipelines with the new dataset tooling. Wrap a pipeline in PipelineInspector to see per-step latency and shapes, and reach for the PSI / KS / Jensen-Shannon drift metrics (via DriftReport) when training and serving distributions start to diverge.
Map errors cleanly into Python. From the FFI layer, into_py_err() turns a TenflowersError into the right typed exception (ValueError, IndexError, MemoryError, and friends), so Python callers get actionable errors instead of a generic blob.

A growing foundation

TenfloweRS sits inside the COOLJAPAN ecosystem and leans on it directly: NumRS2 for n-dimensional arrays, SciRS2 (now at 0.4.2) for the scientific core, OptiRS for optimizers, and Oxicode for serialization. The new ONNX re-exports pair naturally with OxiONNX for interchange, and TenfloweRS stands alongside ToRSh, TrustformeRS, and sklears as the COOLJAPAN machine-learning stack — a deep-learning framework next to a tensor library, a transformers library, and classical ML.

Repository: https://github.com/cool-japan/tenflowers

Star the repo if a pure-Rust TensorFlow alternative with a clean, ergonomic prelude is something you want to see grow. Feedback, issues, and pull requests are all welcome — 0.1.1 is shaped by exactly that kind of real-world use.

— KitaSan at COOLJAPAN OÜ April 24, 2026