Your transformer models now deploy themselves — auto-generated Kubernetes, ACI, and OpenShift manifests; real kernel-fusion and graph-debug analytics; broader GGUF quantization — and a stub backend deleted rather than dressed up.
Today we released TrustformeRS 0.1.2 — a patch that turns production deployment into real artifact generation, makes kernel-fusion and graph debugging actually detect what they advertise, widens GGUF quantization down to Q2_K/Q3_K, and removes a stub-only TPU backend so the capability list stays honest.
TrustformeRS is Pure Rust Hugging Face Transformers: transformer and LLM loading plus inference, tokenizers, and model-hub access — no Python, no PyTorch underneath.
No PyTorch. No Python. No CUDA-C. No libtorch shared object to chase across container images. TrustformeRS compiles to a single static binary or a WASM module, and the serving stack ships its own deployment manifests instead of leaning on a Python web framework. This release also leans into a quieter form of honesty: rather than ship a TPU backend whose FFI binding bodies were all unimplemented, we deleted it. A capability you cannot run is not a feature — it is a liability, and we would rather show you exactly what the framework does today.
Why 0.1.2 matters
Three things in 0.1.2 cross the line from “stub” to “real,” and one thing crosses out.
- Production deployment becomes real.
DeploymentManifest,IngressManifest,ServiceManifest, andNetworkPolicyManifestpreviously emitted minimal or incorrect YAML stubs. They now generate correct, configurable manifests with user-supplied labels, annotations, and configurable selectors. Azure Container Instances (ARM template + CLI script) and OpenShift (BuildConfig, DeploymentConfig, Service, Route, and anocdeploy script) join the picture as first-class targets. - Observability becomes real.
EnhancedProfilernow exports to Flamegraph folded stacks, OpenTelemetry OTLP JSON spans, and Jaeger trace JSON — three formats your existing tooling already understands. - Inference gets smarter.
KernelFusionEngine::find_attention_patternsused to always return an empty vector. It now detects scaled-dot-product attention chains — MatMul to element-wise to Softmax to MatMul — with configurable flags, so fusion passes have something to fuse. - The project audits its own claims. The TPU stub is gone. The capability surface is now a thing you can trust.
Maturity backs this up: 18,008 tests pass via cargo nextest run --all-features (119 skipped), across a 100% Pure Rust codebase.
Technical Deep Dive
(a) Serving and deployment — trustformers-serve. The headline of 0.1.2. The four core Kubernetes manifest generators — DeploymentManifest, IngressManifest, ServiceManifest, NetworkPolicyManifest — were stubs that produced minimal or incorrect YAML; they now emit correct, configurable manifests honoring caller-supplied labels, annotations, and selectors. generate_aci_artifacts produces an Azure Container Instances ARM template plus a CLI deploy script. generate_openshift_artifacts produces a BuildConfig, DeploymentConfig, Service, Route, and an oc deploy script. The Hub UI gained full repository CRUD — update_repository, delete_repository, update_version, delete_version — with HTTP handlers that previously returned NOT_IMPLEMENTED and now delegate to working state methods. MultiCloudOrchestrator improved its instance-selection logic and picked up regression tests for model integration and error handling.
(b) Performance and graph tooling. Two long-standing “always returns empty” bugs are fixed. KernelFusionEngine::find_attention_patterns now walks the graph for the SDPA chain (MatMul to element-wise to Softmax to MatMul) instead of returning nothing. GraphDebugger::find_disconnected_nodes now performs disconnected-node detection with edge cross-validation, rather than returning an empty vector unconditionally. EnhancedProfiler gained the Flamegraph/OTLP/Jaeger export formats described above. And DynamicArchitectureManager::compute_entropy, compute_variance, and compute_sparsity now have real tensor-based implementations — previously they returned hardcoded 0.5, 0.3, and 0.2.
(c) Ops and pipelines. Tensor::softmax_entropy_normalized() computes normalized softmax entropy bounded in [0, 1]. A new ActivationType enum lands in trustformers-models::common with apply() and from_config_str_or() helpers, and the Phi-3 model exposes RotaryEmbedding::half_dim(). On the inference side, a token-classification pipeline arrives, and ObjectDetectionPipeline gains Non-Maximum Suppression for cleaner detections.
(d) Quantization and mobile. GGUF gains Q2_K and Q3_K block quantization methods, each covered by round-trip dequantize tests. LargeModelVisualizer can render PNG heatmaps for sampled layers. An Android backend module arrives with NNAPI bindings plus OpenGL ES and Vulkan GPU backends, and a federated-learning v2 module brings differential privacy, aggregation, secure communication, and crypto submodules.
Under the hood, this release rides the mid-2026 COOLJAPAN stack: scirs2-core and scirs2-linalg moved 0.4.2 to 0.5.0, the OxiARC family (oxiarc-zstd, oxiarc-deflate, oxiarc-lz4, oxiarc-archive) moved 0.2.7 to 0.3.3, and oxicode moved to 0.2.4. OxiBLAS (0.2.1) and OxiONNX (Pure Rust ONNX) round out the foundation. A small but real build fix: the hardware-acceleration benchmark moved criterion_main! to the crate root to resolve E0601 under #[cfg(not(feature = "cuda"))].
Getting Started
cargo add trustformers
The core flow is unchanged — load with from_pretrained, encode, forward — and 0.1.2 adds new surfaces on top. Here we run the model, then read prediction confidence straight off the logits with the new softmax_entropy_normalized():
use trustformers::{AutoModel, AutoTokenizer};
fn main() -> anyhow::Result<()> {
// Same loading flow as always — no Python, no PyTorch.
let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
let model = AutoModel::from_pretrained("bert-base-uncased")?;
let inputs = tokenizer.encode("TrustformeRS runs entirely in Rust.", None)?;
let logits = model.forward(&inputs)?;
// New in 0.1.2: normalized softmax entropy in [0, 1].
// Low entropy => confident prediction; high entropy => uncertain.
let confidence = logits.softmax_entropy_normalized()?;
println!("normalized entropy: {confidence:.3}");
Ok(())
}
Prefer a task pipeline? The new token-classification pipeline gives you NER out of the box:
use trustformers::pipeline;
let ner = pipeline("token-classification")?;
let entities = ner.run("KitaSan founded COOLJAPAN OÜ in Tallinn.")?;
for ent in entities {
println!("{} -> {}", ent.word, ent.entity);
}
What’s New in 0.1.2
Serving and deploy
- Real, configurable Kubernetes manifests: Deployment, Ingress, Service, NetworkPolicy (labels, annotations, selectors).
- Azure Container Instances artifacts: ARM template plus CLI deploy script (
generate_aci_artifacts). - OpenShift artifacts: BuildConfig, DeploymentConfig, Service, Route, and an
ocscript (generate_openshift_artifacts). - Hub UI repository CRUD (
update_repository/delete_repository/update_version/delete_version) plus HTTP handlers. MultiCloudOrchestratorinstance-selection improvements and new regression tests.
Performance and debugging
- SDPA chain detection in
KernelFusionEngine::find_attention_patterns(MatMul to element-wise to Softmax to MatMul). - Disconnected-node detection in
GraphDebugger::find_disconnected_nodeswith edge cross-validation. EnhancedProfilerexports: Flamegraph, OpenTelemetry OTLP JSON, Jaeger trace JSON.- Real tensor-based
compute_entropy/compute_variance/compute_sparsityinDynamicArchitectureManager.
New ops and pipelines
Tensor::softmax_entropy_normalized()(normalized entropy in[0, 1]).ActivationTypeenum withapply()/from_config_str_or()intrustformers-models::common.- Phi-3
RotaryEmbedding::half_dim()accessor. - Token-classification pipeline; NMS in
ObjectDetectionPipeline.
Quantization and mobile
- Q2_K and Q3_K GGUF block quantization with round-trip dequantize tests.
- PNG heatmap visualization for sampled layers in
LargeModelVisualizer. - Android backend (NNAPI bindings, OpenGL ES, Vulkan); federated-learning v2 (DP, aggregation, secure comms, crypto).
Removed
- The
tpufeature flag andtpu_impl.rs— the TPU backend was stub-only (every FFI binding body unimplemented), removed to avoid misleading capability claims.
Dependency bumps
- SciRS2 0.5.0 (scirs2-core, scirs2-linalg), OxiARC 0.3.3 (zstd/deflate/lz4/archive), oxicode 0.2.4.
Tips
-
Generate deployment manifests instead of hand-writing YAML. Let
trustformers-serveemit your Kubernetes Deployment/Ingress/Service/NetworkPolicy, or callgenerate_aci_artifacts/generate_openshift_artifactsfor Azure and OpenShift. Pass your own labels, annotations, and selectors so the output drops straight into your cluster. -
Wire profiles into the observability stack you already run.
EnhancedProfilerexports Flamegraph folded stacks, OTLP JSON spans, and Jaeger trace JSON — point them at your existing flamegraph viewer, OpenTelemetry collector, or Jaeger UI without writing glue. -
Gauge prediction confidence with one call.
logits.softmax_entropy_normalized()returns a value in[0, 1]; treat low entropy as a confident prediction and high entropy as a candidate for human review or a fallback path. -
Quantize aggressively when footprint is king. Reach for the new Q2_K / Q3_K GGUF block quantization for the smallest on-disk and in-memory footprint; the round-trip dequantize tests give you a baseline for the accuracy trade-off.
-
Enable the SDPA kernel-fusion paths. With
find_attention_patternsnow detecting MatMul to element-wise to Softmax to MatMul chains, fusion passes can collapse attention into fewer kernels — keep the configurable flags on for attention-heavy models. -
TPU is gone — plan accordingly. There is no longer a
tpufeature. Target CUDA (via OxiCUDA), Metal, or WebGPU for acceleration instead.
This is the foundation
TrustformeRS sits inside the COOLJAPAN ecosystem as it stands in mid-2026: built on SciRS2 0.5.0, OxiBLAS, Oxicode, OxiARC 0.3.3, and OxiONNX, it pairs naturally with OxiCUDA for GPU serving and sits beside OxiLLaMa, ToRSh, SkleaRS, and TenfloweRS as part of one Pure Rust stack. Every crate is kept under the 2000-line policy with SplitRS, so the codebase stays readable as it grows — 100% Pure Rust, no FFI surprises.
Repository: https://github.com/cool-japan/trustformers
Star the repo if you want transformer inference and serving you can deploy from a single binary, audit line by line, and trust to tell you the truth about what it can and cannot do. Sovereign AI, all the way down.
— KitaSan at COOLJAPAN OÜ June 21, 2026