TrustformeRS 0.1.2 Released — Production Serving, Kernel Fusion, and Honest Capabilities

Your transformer models now deploy themselves — auto-generated Kubernetes, ACI, and OpenShift manifests; real kernel-fusion and graph-debug analytics; broader GGUF quantization — and a stub backend deleted rather than dressed up.

Today we released TrustformeRS 0.1.2 — a patch that turns production deployment into real artifact generation, makes kernel-fusion and graph debugging actually detect what they advertise, widens GGUF quantization down to Q2_K/Q3_K, and removes a stub-only TPU backend so the capability list stays honest.

TrustformeRS is Pure Rust Hugging Face Transformers: transformer and LLM loading plus inference, tokenizers, and model-hub access — no Python, no PyTorch underneath.

No PyTorch. No Python. No CUDA-C. No libtorch shared object to chase across container images. TrustformeRS compiles to a single static binary or a WASM module, and the serving stack ships its own deployment manifests instead of leaning on a Python web framework. This release also leans into a quieter form of honesty: rather than ship a TPU backend whose FFI binding bodies were all unimplemented, we deleted it. A capability you cannot run is not a feature — it is a liability, and we would rather show you exactly what the framework does today.

Why 0.1.2 matters

Three things in 0.1.2 cross the line from “stub” to “real,” and one thing crosses out.

Production deployment becomes real. DeploymentManifest, IngressManifest, ServiceManifest, and NetworkPolicyManifest previously emitted minimal or incorrect YAML stubs. They now generate correct, configurable manifests with user-supplied labels, annotations, and configurable selectors. Azure Container Instances (ARM template + CLI script) and OpenShift (BuildConfig, DeploymentConfig, Service, Route, and an oc deploy script) join the picture as first-class targets.
Observability becomes real. EnhancedProfiler now exports to Flamegraph folded stacks, OpenTelemetry OTLP JSON spans, and Jaeger trace JSON — three formats your existing tooling already understands.
Inference gets smarter. KernelFusionEngine::find_attention_patterns used to always return an empty vector. It now detects scaled-dot-product attention chains — MatMul to element-wise to Softmax to MatMul — with configurable flags, so fusion passes have something to fuse.
The project audits its own claims. The TPU stub is gone. The capability surface is now a thing you can trust.

Maturity backs this up: 18,008 tests pass via cargo nextest run --all-features (119 skipped), across a 100% Pure Rust codebase.

Technical Deep Dive

(a) Serving and deployment — trustformers-serve. The headline of 0.1.2. The four core Kubernetes manifest generators — DeploymentManifest, IngressManifest, ServiceManifest, NetworkPolicyManifest — were stubs that produced minimal or incorrect YAML; they now emit correct, configurable manifests honoring caller-supplied labels, annotations, and selectors. generate_aci_artifacts produces an Azure Container Instances ARM template plus a CLI deploy script. generate_openshift_artifacts produces a BuildConfig, DeploymentConfig, Service, Route, and an oc deploy script. The Hub UI gained full repository CRUD — update_repository, delete_repository, update_version, delete_version — with HTTP handlers that previously returned NOT_IMPLEMENTED and now delegate to working state methods. MultiCloudOrchestrator improved its instance-selection logic and picked up regression tests for model integration and error handling.

(b) Performance and graph tooling. Two long-standing “always returns empty” bugs are fixed. KernelFusionEngine::find_attention_patterns now walks the graph for the SDPA chain (MatMul to element-wise to Softmax to MatMul) instead of returning nothing. GraphDebugger::find_disconnected_nodes now performs disconnected-node detection with edge cross-validation, rather than returning an empty vector unconditionally. EnhancedProfiler gained the Flamegraph/OTLP/Jaeger export formats described above. And DynamicArchitectureManager::compute_entropy, compute_variance, and compute_sparsity now have real tensor-based implementations — previously they returned hardcoded 0.5, 0.3, and 0.2.

(c) Ops and pipelines. Tensor::softmax_entropy_normalized() computes normalized softmax entropy bounded in [0, 1]. A new ActivationType enum lands in trustformers-models::common with apply() and from_config_str_or() helpers, and the Phi-3 model exposes RotaryEmbedding::half_dim(). On the inference side, a token-classification pipeline arrives, and ObjectDetectionPipeline gains Non-Maximum Suppression for cleaner detections.

(d) Quantization and mobile. GGUF gains Q2_K and Q3_K block quantization methods, each covered by round-trip dequantize tests. LargeModelVisualizer can render PNG heatmaps for sampled layers. An Android backend module arrives with NNAPI bindings plus OpenGL ES and Vulkan GPU backends, and a federated-learning v2 module brings differential privacy, aggregation, secure communication, and crypto submodules.

Under the hood, this release rides the mid-2026 COOLJAPAN stack: scirs2-core and scirs2-linalg moved 0.4.2 to 0.5.0, the OxiARC family (oxiarc-zstd, oxiarc-deflate, oxiarc-lz4, oxiarc-archive) moved 0.2.7 to 0.3.3, and oxicode moved to 0.2.4. OxiBLAS (0.2.1) and OxiONNX (Pure Rust ONNX) round out the foundation. A small but real build fix: the hardware-acceleration benchmark moved criterion_main! to the crate root to resolve E0601 under #[cfg(not(feature = "cuda"))].

Getting Started

cargo add trustformers

The core flow is unchanged — load with from_pretrained, encode, forward — and 0.1.2 adds new surfaces on top. Here we run the model, then read prediction confidence straight off the logits with the new softmax_entropy_normalized():

use trustformers::{AutoModel, AutoTokenizer};

fn main() -> anyhow::Result<()> {
    // Same loading flow as always — no Python, no PyTorch.
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;

    let inputs = tokenizer.encode("TrustformeRS runs entirely in Rust.", None)?;
    let logits = model.forward(&inputs)?;

    // New in 0.1.2: normalized softmax entropy in [0, 1].
    // Low entropy => confident prediction; high entropy => uncertain.
    let confidence = logits.softmax_entropy_normalized()?;
    println!("normalized entropy: {confidence:.3}");

    Ok(())
}

Prefer a task pipeline? The new token-classification pipeline gives you NER out of the box:

use trustformers::pipeline;

let ner = pipeline("token-classification")?;
let entities = ner.run("KitaSan founded COOLJAPAN OÜ in Tallinn.")?;
for ent in entities {
    println!("{} -> {}", ent.word, ent.entity);
}

What’s New in 0.1.2

Serving and deploy

Real, configurable Kubernetes manifests: Deployment, Ingress, Service, NetworkPolicy (labels, annotations, selectors).
Azure Container Instances artifacts: ARM template plus CLI deploy script (generate_aci_artifacts).
OpenShift artifacts: BuildConfig, DeploymentConfig, Service, Route, and an oc script (generate_openshift_artifacts).
Hub UI repository CRUD (update_repository / delete_repository / update_version / delete_version) plus HTTP handlers.
MultiCloudOrchestrator instance-selection improvements and new regression tests.

Performance and debugging

SDPA chain detection in KernelFusionEngine::find_attention_patterns (MatMul to element-wise to Softmax to MatMul).
Disconnected-node detection in GraphDebugger::find_disconnected_nodes with edge cross-validation.
EnhancedProfiler exports: Flamegraph, OpenTelemetry OTLP JSON, Jaeger trace JSON.
Real tensor-based compute_entropy / compute_variance / compute_sparsity in DynamicArchitectureManager.

New ops and pipelines

Tensor::softmax_entropy_normalized() (normalized entropy in [0, 1]).
ActivationType enum with apply() / from_config_str_or() in trustformers-models::common.
Phi-3 RotaryEmbedding::half_dim() accessor.
Token-classification pipeline; NMS in ObjectDetectionPipeline.

Quantization and mobile

Q2_K and Q3_K GGUF block quantization with round-trip dequantize tests.
PNG heatmap visualization for sampled layers in LargeModelVisualizer.
Android backend (NNAPI bindings, OpenGL ES, Vulkan); federated-learning v2 (DP, aggregation, secure comms, crypto).

Removed

The tpu feature flag and tpu_impl.rs — the TPU backend was stub-only (every FFI binding body unimplemented), removed to avoid misleading capability claims.

Dependency bumps

SciRS2 0.5.0 (scirs2-core, scirs2-linalg), OxiARC 0.3.3 (zstd/deflate/lz4/archive), oxicode 0.2.4.

Tips

Generate deployment manifests instead of hand-writing YAML. Let trustformers-serve emit your Kubernetes Deployment/Ingress/Service/NetworkPolicy, or call generate_aci_artifacts / generate_openshift_artifacts for Azure and OpenShift. Pass your own labels, annotations, and selectors so the output drops straight into your cluster.
Wire profiles into the observability stack you already run. EnhancedProfiler exports Flamegraph folded stacks, OTLP JSON spans, and Jaeger trace JSON — point them at your existing flamegraph viewer, OpenTelemetry collector, or Jaeger UI without writing glue.
Gauge prediction confidence with one call. logits.softmax_entropy_normalized() returns a value in [0, 1]; treat low entropy as a confident prediction and high entropy as a candidate for human review or a fallback path.
Quantize aggressively when footprint is king. Reach for the new Q2_K / Q3_K GGUF block quantization for the smallest on-disk and in-memory footprint; the round-trip dequantize tests give you a baseline for the accuracy trade-off.
Enable the SDPA kernel-fusion paths. With find_attention_patterns now detecting MatMul to element-wise to Softmax to MatMul chains, fusion passes can collapse attention into fewer kernels — keep the configurable flags on for attention-heavy models.
TPU is gone — plan accordingly. There is no longer a tpu feature. Target CUDA (via OxiCUDA), Metal, or WebGPU for acceleration instead.

This is the foundation

TrustformeRS sits inside the COOLJAPAN ecosystem as it stands in mid-2026: built on SciRS2 0.5.0, OxiBLAS, Oxicode, OxiARC 0.3.3, and OxiONNX, it pairs naturally with OxiCUDA for GPU serving and sits beside OxiLLaMa, ToRSh, SkleaRS, and TenfloweRS as part of one Pure Rust stack. Every crate is kept under the 2000-line policy with SplitRS, so the codebase stays readable as it grows — 100% Pure Rust, no FFI surprises.

Repository: https://github.com/cool-japan/trustformers

Star the repo if you want transformer inference and serving you can deploy from a single binary, audit line by line, and trust to tell you the truth about what it can and cannot do. Sovereign AI, all the way down.

— KitaSan at COOLJAPAN OÜ June 21, 2026