COOLJAPAN
← All posts

OxiBonsai 0.1.5 Released — OxiBonsai Goes Multimodal: a Pure-Rust FLUX.2-Klein Text-to-Image Pipeline

OxiBonsai 0.1.5 adds the oxibonsai-image crate — the first pure-Rust, zero-FFI, C/C++/Fortran-free FLUX.2-Klein text-to-image pipeline (DiT + VAE + Qwen3-4B text encoder), parity-validated at cos≥0.999, with Metal flash-attention and ~52–62s end-to-end on an M3. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, now spanning text and image.

release oxibonsai llm inference pure-rust quantization diffusion text-to-image flux multimodal metal

OxiBonsai stopped being a thing that only writes words. Today it draws.

Today we released OxiBonsai 0.1.5 — introducing the new oxibonsai-image crate, a complete Pure-Rust text-to-image pipeline for PrismML Bonsai-Image (FLUX.2-Klein 4B) that turns a prompt into a PNG without a single line of Python at runtime.

No PyTorch. No diffusers. No Python at inference time. No C, no C++, no Fortran. No llama.cpp. No BLAS. Even the PNG encoder is Pure Rust, written on top of OxiARC.

OxiBonsai is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit Q1_0_g128 line and the ternary TQ2_0_g128 line — running on CPU SIMD, Apple Silicon Metal, and NVIDIA CUDA. Until now it was an LLM-only engine; 0.1.4 hardened it into a production runtime with controllers, observability, K-quant/FP8 families, CUDA batch prefill, and constrained decoding. 0.1.5 makes it multimodal.

Why OxiBonsai 0.1.5 matters

Generating an image today means standing up a stack: PyTorch, diffusers, a CUDA C++ kernel library, a Python interpreter, and a pile of native wheels that fight your toolchain. The model is small; the dependency tree is enormous.

OxiBonsai 0.1.5 collapses all of that into one Pure-Rust binary. The oxibonsai-image crate is the first pure-Rust, C/C++/Fortran-free, zero-FFI implementation of the Bonsai-Image FLUX.2-Klein text-to-image pipeline, built entirely on the COOLJAPAN ecosystem. Every stage — text encoder, diffusion transformer, VAE decoder — is parity-validated against the MLX reference at cos≥0.999. It runs on Apple Silicon Metal by default, falls back cleanly to CPU, and never asks for Python at runtime.

Technical Deep Dive

The whole pipeline is a single straight line from prompt to pixels:

prompt
  │
  ▼  Text Encoder ── Qwen3-4B, 4-bit (open_mlx_4bit)
  │
  ▼  DiT ── FLUX.2-Klein ternary, TQ2_0_g128 (Flux2Transformer2DModel)
  │
  ▼  VAE decoder ── AutoencoderKLFlux2 (Pure-Rust Conv2d)
  │
  ▼  PNG ── oxiarc-deflate (Pure-Rust DEFLATE)
  │
  ▼  out.png

Every model stage in that line — text encoder, DiT, and VAE — is independently parity-validated at cos≥0.999 against the MLX reference. Nothing here is a Pure-Rust approximation that quietly drifts; each block is held to the original numerics.

The DiT — Flux2Transformer2DModel

The diffusion transformer is the heart of the generator: 5 double-stream blocks followed by 20 single-stream blocks, all carrying ternary TQ2_0_g128 weights.

Ternary is the whole point here. A diffusion transformer is normally a multi-gigabyte fp16 model. Carrying its weights at sub-2-bit ternary is what lets Bonsai-Image fit alongside the text encoder inside a real ~3.5 GB footprint instead of a workstation’s worth of VRAM.

Latents are 128-channel. The block internals are a faithful FLUX.2-Klein port:

Two kernels make it fast on Apple Silicon:

Correctness is pinned by the dit_parity gate: 59 taps, all required to hold cos≥0.999 against the reference. That matters because ternary weights are exactly where a Pure-Rust port could silently lose precision; the 59-tap gate makes that failure mode impossible to ship.

The VAE decoder — AutoencoderKLFlux2

The decoder is a from-scratch Pure-Rust convolutional network. Conv2d is implemented as im2col + GEMM, and the rest of the stack is built up layer by layer:

This is the stage that turns latents back into pixels, so it runs in full fp32 and is held to the same parity bar as the rest of the pipeline.

On Metal the VAE is default-on and drops decode time from 22.5s to 6.9s — about 3.2× over CPU. An implicit-GEMM, im2col-free convolution (vae_conv_implicit.rs, routed in encode_conv2d_f32 for kernels k≥3) trimmed it further, from 9.1s to 6.9s. You can opt out with OXI_VAE_GPU=0. The vae_parity gate holds 11 taps at cos≥0.999.

A native VAE safetensors loader (src/vae/safetensors.rs) reads FLUX.2 .safetensors directly: bf16→f32 is lossless (a pure bit-shift, zero rounding), and conv weights are transposed [O,I,kH,kW]→[O,kH,kW,I] on load. That eliminates the old Python .npy export step entirely.

Operationally, that is a real simplification. The old path needed Python and a conversion step just to stage the weights; now the engine ingests the original checkpoint as shipped, and there is no offline export to keep in sync.

The text encoder — Qwen3-4B, 4-bit

The prompt encoder is Qwen3-4B loaded through open_mlx_4bit. It reads native 2.1 GB MLX 4-bit safetensors directly — down from a 15 GB f32 .npy dump — and dequantizes the mlx-packed-affine 4-bit weights to f32 on demand.

The real Bonsai-Image footprint lands at roughly 3.5 GB. Activate it with OXI_TE_4BIT.

The te_parity gate is the strictest in the pipeline: cos≥0.999999 against the MLX oracle. The text encoder sets the conditioning for everything downstream, so a near-bit-exact match here is what keeps the rest of the pipeline faithful to the reference.

Scheduler, RNG, and PNG

Sampling uses a flow-match Euler scheduler with dynamic μ-shift (sequence-length-dependent exponential time-shift), native init noise, img_ids/txt_ids, and sigmas/timesteps generation. Noise comes from an MLX-exact Threefry-2×32 RNG port (src/sample/mlx_rng.rs): 5 rounds, exact rotation constants, per-round key-inject — with --seed 42 it byte-matches the official mflux reference.

The final image is written by oxiarc-deflate, a Pure-Rust DEFLATE encoder from the COOLJAPAN ecosystem (no flate2, no zstd, no zip), and the output is parity-validated against a reference PNG at 512×512.

A CUDA imagen backend also lands here for Linux/Windows, authored as a blind mirror of the Metal path — parity-first plain-FP32, additive, leaving the Metal bytes unchanged. An early steps=4 benchmark on an A4000-class GPU projects ~101s → ~31.7s (3.2×); full compile and cos≥0.999 CUDA parity validation are deferred to CUDA hardware, so treat this as the backend landing rather than a GA number. CUDA imagen GA is the story for a later release.

Getting Started

cargo install oxibonsai-cli           # installs the `oxibonsai` binary

# Configure the imagen assets once (copy the template, then edit paths):
cp .env.example .env
#   OXI_DIT_GGUF=...         # FLUX.2-Klein ternary DiT (GGUF, TQ2_0_g128)
#   OXI_VAE_WEIGHTS=...      # AutoencoderKLFlux2 .safetensors (or a .npy dir)
#   OXI_TE_4BIT=...          # Qwen3-4B text encoder, 2.1 GB MLX 4-bit safetensors
#   OXI_TE_TOKENIZER_DIR=... # Qwen3 tokenizer dir

# Generate a 512×512 PNG (Metal default-on; ~52–62s on an M3, steps=4):
oxibonsai image --prompt "a tiny bonsai tree in a ceramic pot" --out bonsai.png

docs/IMAGEN.md walks through fetching the checkpoints from HuggingFace — the only non-Rust step in the whole flow, and it is used purely to download weights. All conversion and all inference are Pure Rust.

What’s New in 0.1.5

Here is everything that landed in this release:

Tips

This is the foundation

OxiBonsai rides on the COOLJAPAN ecosystem: SciRS2 for the numerics, OxiBLAS for the linear algebra, OxiFFT, OxiARC (which powers the Pure-Rust PNG/DEFLATE encoder), and OxiONNX. It already served PrismML’s sub-2-bit Bonsai LLMs; with 0.1.5 it also serves PrismML Bonsai-Image. One Pure-Rust engine now spans both text and image — multimodal sovereign inference, no Python in the loop, no FFI under the hood.

Repository: https://github.com/cool-japan/oxibonsai

Star the repo if a single Pure-Rust binary that both writes and draws is the future you want to build on.

Pure Rust sovereign inference — for words and for pixels — is here: fast, safe, sovereign, and now multimodal.

KitaSan at COOLJAPAN OÜ June 2, 2026

↑ Back to all posts