COOLJAPAN
← All posts

OxiBonsai 0.2.1 Released — Minutes-Long Numeric Tests, Now Fast (and a VAE File Fix)

A quality-of-life and correctness release for OxiBonsai: optimized test/dev compile profiles turn minutes-long numeric tests fast while keeping float parity bit-stable, a VAE precheck fix that finally accepts a .safetensors file, and corrected HuggingFace asset paths. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

release oxibonsai llm inference pure-rust quantization developer-experience testing text-to-image

The fastest way to make a heavy-numeric Rust workspace fun again is two lines in Cargo.toml — and that is the headline of this release.

Today we released OxiBonsai 0.2.1 — a quality-of-life and correctness release that makes the test suite minutes-faster, fixes a VAE precheck that wrongly rejected a valid .safetensors file, and points the docs at the correct HuggingFace asset layout.

OxiBonsai is the first Pure Rust, zero-FFI, C/C++/Fortran-free inference engine for PrismML’s sub-2-bit Bonsai models — the 1-bit (Q1_0_g128) and ternary (TQ2_0_g128) lines — and, since 0.1.5, the Bonsai-Image text-to-image pipeline. The previous release, 0.2.0, brought a concurrent engine pool for /serve and a native CUDA imagen backend. 0.2.1 is pure polish on top of that.

No llama.cpp. No BLAS. No C/C++/Fortran. Everything below is Rust, all the way down to the DEFLATE that writes the PNG.

Why OxiBonsai 0.2.1 matters

If you have ever worked on a heavy-numeric Rust workspace, you know the trap: cargo test builds with opt-level = 0, and your real numeric tests — the parity golden references, the model forward passes — run completely unoptimized. In OxiBonsai that meant the DiT-shape joint-attention CPU reference (~14.5 GFLOP) and the speculative decoder’s ~240 forward passes over a 151,936-row vocabulary took minutes, every run. That tax lands on every contributor, every CI job, every time.

0.2.1 removes it with a profile change, not a code change — and does so without touching float results. The VAE fix and the doc corrections close two papercuts that bit anyone trying to follow the Bonsai-Image getting-started path.

Technical Deep Dive

The profile opt-level trick. OxiBonsai’s Cargo.toml now sets:

[profile.test]
opt-level = 2

[profile.dev.package."*"]
opt-level = 3

[profile.test] opt-level = 2 optimizes the test binaries themselves; [profile.dev.package."*"] raises optimization for all dependencies — including workspace path-deps like oxibonsai-model and oxibonsai-kernels when they are built as dependencies of another crate’s tests — so the numeric kernels are compiled and autovectorized instead of running as scalar opt-level = 0 code. Crucially, the crate you are actively editing stays at dev opt-level = 0, so incremental compiles of your own code remain fast: you optimize the heavy dependencies once, then iterate cheaply.

The important caveat, and the reason this is safe: float results are unchanged. Rust does not enable fast-math, so a higher opt-level does not reassociate floating-point reductions. The parity gates (cosine ≥ 0.999) stay bit-stable across opt-level settings — you get the speed without paying for it in numerical drift.

VAE precheck now accepts a file (#9). The text-to-image precheck in oxibonsai-image’s src/pipeline.rs used is_dir(), so a valid .safetensors file passed via --vae / OXI_VAE_WEIGHTS was rejected with “VAE weights dir not found” before any loading — even though VaeWeights::open and the docs both accept a file. The precheck now accepts a file or a directory (is_file() || is_dir()), the error wording is corrected, the stale doc-comment is fixed, and test_issue_9_* regression tests lock the behavior in.

Pure-Rust PNG backend tracked forward. oxiarc-deflate is bumped 0.3.2 → 0.3.3 per the COOLJAPAN Latest-crates policy. It is the Pure-Rust DEFLATE backend behind Bonsai-Image’s PNG output; the substantive 0.3.3 OxiARC fixes land in sibling crates, so for OxiBonsai this is a version-tracking bump with no change to DEFLATE/PNG behavior.

Getting Started

Nothing new to install — if you are already on OxiBonsai, you are set. The change here is that the docs now point at the correct HuggingFace layout. hf download preserves the repo subfolder, so the files land under the paths you pass to the CLI:

pip install huggingface_hub      # the only non-Rust step — download only

# DiT, text encoder + tokenizer, and the bundled VAE all live in ONE repo:
hf download prism-ml/bonsai-image-ternary-4B-mlx-2bit \
    transformer-packed-mflux/diffusion_pytorch_model.safetensors \
    text_encoder-mlx-4bit/model.safetensors text_encoder-mlx-4bit/tokenizer.json \
    vae/diffusion_pytorch_model.safetensors --local-dir ./bonsai

# Convert the DiT to GGUF (Pure Rust; defaults to tq2_0_g128):
cargo run -p oxibonsai-model --example mlx_image_convert --release -- \
    ./bonsai/transformer-packed-mflux/diffusion_pytorch_model.safetensors ./bonsai-dit.gguf

Then generate an image, passing the VAE as a single .safetensors file — the thing 0.2.1 makes work:

oxibonsai image \
    --prompt "a tiny bonsai tree in a ceramic pot" --out bonsai.png \
    --dit ./bonsai-dit.gguf \
    --te  ./bonsai/text_encoder-mlx-4bit/model.safetensors \
    --vae ./bonsai/vae/diffusion_pytorch_model.safetensors \
    --seed 42 --steps 4

What’s New in 0.2.1

Tips

This is the foundation

OxiBonsai is built entirely on the COOLJAPAN ecosystem — SciRS2 for numerics, OxiBLAS for linear algebra, OxiFFT for transforms, OxiARC (via oxiarc-deflate) for the Pure-Rust PNG path, and OxiONNX for model ingestion — serving the PrismML Bonsai sub-2-bit model family with zero FFI and no C/Fortran runtime.

Repository: https://github.com/cool-japan/oxibonsai

Star the repo if you want sub-2-bit inference with a test suite that respects your time and a docs path that just works.

Pure Rust sovereign AI inference is here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ June 6, 2026

↑ Back to all posts