COOLJAPAN
← All posts

TrustformeRS 0.1.2 Released — Production Serving, Kernel Fusion, and Honest Capabilities

TrustformeRS 0.1.2 patch — full Kubernetes/ACI/OpenShift deploy manifests, EnhancedProfiler exports (Flamegraph/OTLP/Jaeger), scaled-dot-product attention kernel fusion, NMS + token-classification pipelines, Q2_K/Q3_K GGUF quant; the stub-only TPU backend is removed for honesty.

release trustformers rust transformers llm serving kubernetes quantization

Your transformer models now deploy themselves — auto-generated Kubernetes, ACI, and OpenShift manifests; real kernel-fusion and graph-debug analytics; broader GGUF quantization — and a stub backend deleted rather than dressed up.

Today we released TrustformeRS 0.1.2 — a patch that turns production deployment into real artifact generation, makes kernel-fusion and graph debugging actually detect what they advertise, widens GGUF quantization down to Q2_K/Q3_K, and removes a stub-only TPU backend so the capability list stays honest.

TrustformeRS is Pure Rust Hugging Face Transformers: transformer and LLM loading plus inference, tokenizers, and model-hub access — no Python, no PyTorch underneath.

No PyTorch. No Python. No CUDA-C. No libtorch shared object to chase across container images. TrustformeRS compiles to a single static binary or a WASM module, and the serving stack ships its own deployment manifests instead of leaning on a Python web framework. This release also leans into a quieter form of honesty: rather than ship a TPU backend whose FFI binding bodies were all unimplemented, we deleted it. A capability you cannot run is not a feature — it is a liability, and we would rather show you exactly what the framework does today.

Why 0.1.2 matters

Three things in 0.1.2 cross the line from “stub” to “real,” and one thing crosses out.

Maturity backs this up: 18,008 tests pass via cargo nextest run --all-features (119 skipped), across a 100% Pure Rust codebase.

Technical Deep Dive

(a) Serving and deployment — trustformers-serve. The headline of 0.1.2. The four core Kubernetes manifest generators — DeploymentManifest, IngressManifest, ServiceManifest, NetworkPolicyManifest — were stubs that produced minimal or incorrect YAML; they now emit correct, configurable manifests honoring caller-supplied labels, annotations, and selectors. generate_aci_artifacts produces an Azure Container Instances ARM template plus a CLI deploy script. generate_openshift_artifacts produces a BuildConfig, DeploymentConfig, Service, Route, and an oc deploy script. The Hub UI gained full repository CRUD — update_repository, delete_repository, update_version, delete_version — with HTTP handlers that previously returned NOT_IMPLEMENTED and now delegate to working state methods. MultiCloudOrchestrator improved its instance-selection logic and picked up regression tests for model integration and error handling.

(b) Performance and graph tooling. Two long-standing “always returns empty” bugs are fixed. KernelFusionEngine::find_attention_patterns now walks the graph for the SDPA chain (MatMul to element-wise to Softmax to MatMul) instead of returning nothing. GraphDebugger::find_disconnected_nodes now performs disconnected-node detection with edge cross-validation, rather than returning an empty vector unconditionally. EnhancedProfiler gained the Flamegraph/OTLP/Jaeger export formats described above. And DynamicArchitectureManager::compute_entropy, compute_variance, and compute_sparsity now have real tensor-based implementations — previously they returned hardcoded 0.5, 0.3, and 0.2.

(c) Ops and pipelines. Tensor::softmax_entropy_normalized() computes normalized softmax entropy bounded in [0, 1]. A new ActivationType enum lands in trustformers-models::common with apply() and from_config_str_or() helpers, and the Phi-3 model exposes RotaryEmbedding::half_dim(). On the inference side, a token-classification pipeline arrives, and ObjectDetectionPipeline gains Non-Maximum Suppression for cleaner detections.

(d) Quantization and mobile. GGUF gains Q2_K and Q3_K block quantization methods, each covered by round-trip dequantize tests. LargeModelVisualizer can render PNG heatmaps for sampled layers. An Android backend module arrives with NNAPI bindings plus OpenGL ES and Vulkan GPU backends, and a federated-learning v2 module brings differential privacy, aggregation, secure communication, and crypto submodules.

Under the hood, this release rides the mid-2026 COOLJAPAN stack: scirs2-core and scirs2-linalg moved 0.4.2 to 0.5.0, the OxiARC family (oxiarc-zstd, oxiarc-deflate, oxiarc-lz4, oxiarc-archive) moved 0.2.7 to 0.3.3, and oxicode moved to 0.2.4. OxiBLAS (0.2.1) and OxiONNX (Pure Rust ONNX) round out the foundation. A small but real build fix: the hardware-acceleration benchmark moved criterion_main! to the crate root to resolve E0601 under #[cfg(not(feature = "cuda"))].

Getting Started

cargo add trustformers

The core flow is unchanged — load with from_pretrained, encode, forward — and 0.1.2 adds new surfaces on top. Here we run the model, then read prediction confidence straight off the logits with the new softmax_entropy_normalized():

use trustformers::{AutoModel, AutoTokenizer};

fn main() -> anyhow::Result<()> {
    // Same loading flow as always — no Python, no PyTorch.
    let tokenizer = AutoTokenizer::from_pretrained("bert-base-uncased")?;
    let model = AutoModel::from_pretrained("bert-base-uncased")?;

    let inputs = tokenizer.encode("TrustformeRS runs entirely in Rust.", None)?;
    let logits = model.forward(&inputs)?;

    // New in 0.1.2: normalized softmax entropy in [0, 1].
    // Low entropy => confident prediction; high entropy => uncertain.
    let confidence = logits.softmax_entropy_normalized()?;
    println!("normalized entropy: {confidence:.3}");

    Ok(())
}

Prefer a task pipeline? The new token-classification pipeline gives you NER out of the box:

use trustformers::pipeline;

let ner = pipeline("token-classification")?;
let entities = ner.run("KitaSan founded COOLJAPAN OÜ in Tallinn.")?;
for ent in entities {
    println!("{} -> {}", ent.word, ent.entity);
}

What’s New in 0.1.2

Serving and deploy

Performance and debugging

New ops and pipelines

Quantization and mobile

Removed

Dependency bumps

Tips

This is the foundation

TrustformeRS sits inside the COOLJAPAN ecosystem as it stands in mid-2026: built on SciRS2 0.5.0, OxiBLAS, Oxicode, OxiARC 0.3.3, and OxiONNX, it pairs naturally with OxiCUDA for GPU serving and sits beside OxiLLaMa, ToRSh, SkleaRS, and TenfloweRS as part of one Pure Rust stack. Every crate is kept under the 2000-line policy with SplitRS, so the codebase stays readable as it grows — 100% Pure Rust, no FFI surprises.

Repository: https://github.com/cool-japan/trustformers

Star the repo if you want transformer inference and serving you can deploy from a single binary, audit line by line, and trust to tell you the truth about what it can and cannot do. Sovereign AI, all the way down.

KitaSan at COOLJAPAN OÜ June 21, 2026

↑ Back to all posts