The fastest way to make a heavy-numeric Rust workspace fun again is two lines in Cargo.toml — and that is the headline of this release.
Today we released OxiBonsai 0.2.1 — a quality-of-life and correctness release that makes the test suite minutes-faster, fixes a VAE precheck that wrongly rejected a valid .safetensors file, and points the docs at the correct HuggingFace asset layout.
OxiBonsai is the first Pure Rust, zero-FFI, C/C++/Fortran-free inference engine for PrismML’s sub-2-bit Bonsai models — the 1-bit (Q1_0_g128) and ternary (TQ2_0_g128) lines — and, since 0.1.5, the Bonsai-Image text-to-image pipeline. The previous release, 0.2.0, brought a concurrent engine pool for /serve and a native CUDA imagen backend. 0.2.1 is pure polish on top of that.
No llama.cpp. No BLAS. No C/C++/Fortran. Everything below is Rust, all the way down to the DEFLATE that writes the PNG.
Why OxiBonsai 0.2.1 matters
If you have ever worked on a heavy-numeric Rust workspace, you know the trap: cargo test builds with opt-level = 0, and your real numeric tests — the parity golden references, the model forward passes — run completely unoptimized. In OxiBonsai that meant the DiT-shape joint-attention CPU reference (~14.5 GFLOP) and the speculative decoder’s ~240 forward passes over a 151,936-row vocabulary took minutes, every run. That tax lands on every contributor, every CI job, every time.
0.2.1 removes it with a profile change, not a code change — and does so without touching float results. The VAE fix and the doc corrections close two papercuts that bit anyone trying to follow the Bonsai-Image getting-started path.
Technical Deep Dive
The profile opt-level trick. OxiBonsai’s Cargo.toml now sets:
[profile.test]
opt-level = 2
[profile.dev.package."*"]
opt-level = 3
[profile.test] opt-level = 2 optimizes the test binaries themselves; [profile.dev.package."*"] raises optimization for all dependencies — including workspace path-deps like oxibonsai-model and oxibonsai-kernels when they are built as dependencies of another crate’s tests — so the numeric kernels are compiled and autovectorized instead of running as scalar opt-level = 0 code. Crucially, the crate you are actively editing stays at dev opt-level = 0, so incremental compiles of your own code remain fast: you optimize the heavy dependencies once, then iterate cheaply.
The important caveat, and the reason this is safe: float results are unchanged. Rust does not enable fast-math, so a higher opt-level does not reassociate floating-point reductions. The parity gates (cosine ≥ 0.999) stay bit-stable across opt-level settings — you get the speed without paying for it in numerical drift.
VAE precheck now accepts a file (#9). The text-to-image precheck in oxibonsai-image’s src/pipeline.rs used is_dir(), so a valid .safetensors file passed via --vae / OXI_VAE_WEIGHTS was rejected with “VAE weights dir not found” before any loading — even though VaeWeights::open and the docs both accept a file. The precheck now accepts a file or a directory (is_file() || is_dir()), the error wording is corrected, the stale doc-comment is fixed, and test_issue_9_* regression tests lock the behavior in.
Pure-Rust PNG backend tracked forward. oxiarc-deflate is bumped 0.3.2 → 0.3.3 per the COOLJAPAN Latest-crates policy. It is the Pure-Rust DEFLATE backend behind Bonsai-Image’s PNG output; the substantive 0.3.3 OxiARC fixes land in sibling crates, so for OxiBonsai this is a version-tracking bump with no change to DEFLATE/PNG behavior.
Getting Started
Nothing new to install — if you are already on OxiBonsai, you are set. The change here is that the docs now point at the correct HuggingFace layout. hf download preserves the repo subfolder, so the files land under the paths you pass to the CLI:
pip install huggingface_hub # the only non-Rust step — download only
# DiT, text encoder + tokenizer, and the bundled VAE all live in ONE repo:
hf download prism-ml/bonsai-image-ternary-4B-mlx-2bit \
transformer-packed-mflux/diffusion_pytorch_model.safetensors \
text_encoder-mlx-4bit/model.safetensors text_encoder-mlx-4bit/tokenizer.json \
vae/diffusion_pytorch_model.safetensors --local-dir ./bonsai
# Convert the DiT to GGUF (Pure Rust; defaults to tq2_0_g128):
cargo run -p oxibonsai-model --example mlx_image_convert --release -- \
./bonsai/transformer-packed-mflux/diffusion_pytorch_model.safetensors ./bonsai-dit.gguf
Then generate an image, passing the VAE as a single .safetensors file — the thing 0.2.1 makes work:
oxibonsai image \
--prompt "a tiny bonsai tree in a ceramic pot" --out bonsai.png \
--dit ./bonsai-dit.gguf \
--te ./bonsai/text_encoder-mlx-4bit/model.safetensors \
--vae ./bonsai/vae/diffusion_pytorch_model.safetensors \
--seed 42 --steps 4
What’s New in 0.2.1
- Optimized
test/devcompile profiles.[profile.test] opt-level = 2and[profile.dev.package."*"] opt-level = 3make test binaries and all dependencies optimized and autovectorized — turning minutes-long numeric test runs fast, while the crate you are editing stays atopt-level = 0for fast incremental builds. Float parity is bit-stable (no fast-math, no reduction reassociation). - VAE
.safetensorsfile fix (#9).--vae/OXI_VAE_WEIGHTSnow accepts a single.safetensorsfile, not just a directory; the precheck no longer rejects valid files with a “dir not found” error. Regression-tested. - Corrected HuggingFace asset paths (#8). The DiT lives under
transformer-packed-mflux/; the text encoder and tokenizer ship inside the mainprism-ml/bonsai-image-ternary-4B-mlx-2bitrepo undertext_encoder-mlx-4bit/(the standaloneprism-ml/text_encoder-mlx-4bitrepo does not exist).docs/IMAGEN.mdand the image-crate README now match the realhf downloadlayout, with the bundled-vs-gated FLUX.2-dev VAE choice clarified. Verified against the live HF API. oxiarc-deflate0.3.2 → 0.3.3 — Latest-crates version tracking for the Pure-Rust DEFLATE/PNG backend.
Tips
- Borrow the profile trick for any heavy-numeric Rust workspace. Add
[profile.test] opt-level = 2plus[profile.dev.package."*"] opt-level = 3and your dependency kernels get autovectorized while your edit-crate stays fast to recompile. Remember the caveat that makes it safe: Rust has no fast-math, so this does not reassociate float reductions — your parity/bit-exactness checks stay stable. - Pass a single
.safetensorsfile to--vae. As of 0.2.1 you can point--vae(orOXI_VAE_WEIGHTS) straight atvae/diffusion_pytorch_model.safetensors— no directory needed. The loader auto-detects: a.safetensorsfile uses the native loader, a directory selects the legacy.npyreader. - Grab everything from one repo. The DiT, the 4-bit text encoder, the tokenizer, and the bundled (non-gated) VAE all live in
prism-ml/bonsai-image-ternary-4B-mlx-2bit— a singlehf downloadfetches them all, no HuggingFace login required. - Follow
docs/IMAGEN.mdfor the exact paths. It now matches the live HF layout end-to-end, including the bundled-VAE Option A vs. gated FLUX.2-dev Option B distinction.
This is the foundation
OxiBonsai is built entirely on the COOLJAPAN ecosystem — SciRS2 for numerics, OxiBLAS for linear algebra, OxiFFT for transforms, OxiARC (via oxiarc-deflate) for the Pure-Rust PNG path, and OxiONNX for model ingestion — serving the PrismML Bonsai sub-2-bit model family with zero FFI and no C/Fortran runtime.
Repository: https://github.com/cool-japan/oxibonsai
Star the repo if you want sub-2-bit inference with a test suite that respects your time and a docs path that just works.
Pure Rust sovereign AI inference is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ June 6, 2026