Generating one image meant reloading multi-gigabyte weights from scratch. Today you load them once, type prompts, and watch the pictures appear right in your terminal.
Today we released OxiBonsai 0.2.2 — an interactive image REPL that keeps the model resident across renders and draws straight into the terminal.
No llama.cpp. No BLAS. No C, no C++, no Fortran runtime. OxiBonsai is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit line (Q1_0_g128) and the ternary line (TQ2_0_g128) — running on CPU SIMD (AVX2/AVX-512/NEON/WASM), Apple Silicon (Metal), and NVIDIA (CUDA NVRTC), all on top of SciRS2, OxiBLAS, OxiFFT, OxiARC, and OxiONNX. Even the terminal graphics encoder in this release is Pure Rust.
Why OxiBonsai 0.2.2 matters
The previous release, 0.2.1, was a quiet DX-and-hardening pass: optimized test/dev profiles and two fixes from the field — a VAE .safetensors-file path bug (#9) and clearer HuggingFace asset-path docs (#8). It made the imaging path smoother to live with. 0.2.2 changes how you actually use it.
Until now, oxibonsai image was a one-shot command: every prompt paid the full cold-start price. Loading the ternary DiT, the VAE, and the Qwen3-4B text encoder — then dequantising the encoder’s weights — is the expensive part of generating an image. Doing it once is fine. Doing it on every prompt while you’re dialing in a composition is a tax on iteration: you sit through a multi-gigabyte warm-up to find out a seed was wrong, fix the seed, and pay the warm-up again.
oxibonsai repl removes that tax. It loads everything once into a resident ImageSession, then renders as many prompts as you like against the already-warm weights. Per-prompt time collapses to just compute — text-encode, DiT sampling, VAE decode — because the load and the encoder warm-up are behind you. And on a kitty-graphics terminal like Ghostty, you never leave the shell to look at the result: the PNG is painted inline, in place.
Technical Deep Dive
ImageSession — load once, render many. The new session (oxibonsai-image, driven from oxibonsai-cli) owns the DiT, the VAE, and the text encoder for its whole lifetime. The first render pays the cold cost; every render after that reuses the resident weights. Each render returns a RenderOutcome carrying StageTimings, so you get the per-stage wall-clock split — how long text-encode, sampling, and VAE decode each took — instead of one opaque total. When you’re tuning, that breakdown tells you which stage your :steps or :size change actually moved.
Resident text encoder — TeWeights::set_resident. The text encoder is the heaviest warm-up: its dequantised f32 weights are roughly 16 GB. The one-shot CLI deliberately keeps those off the heap between forwards to preserve a low-RAM profile, re-dequantising as needed. The REPL is the opposite trade: TeWeights::set_resident(true) tells the Mlx4bit source to cache the dequantised f32 tensors across forwards, so each subsequent prompt skips re-dequantisation entirely. It is off by default — the one-shot path stays lean — and on for the REPL, where you’ve opted into trading RAM for iteration speed.
Inline images via a pure-Rust kitty graphics protocol. A new src/cli/term.rs implements the kitty graphics protocol — including a pure-Rust base64 encoder — to transmit a PNG straight into the terminal’s scrollback. kitty_supported() auto-detects a capable terminal via GHOSTTY_*, TERM, and TERM_PROGRAM. On Ghostty the rendered image shows up inline; on terminals without graphics support, the PNG falls back cleanly to a file (optionally opened in a viewer). No external image viewer, no ImageMagick, no C — the whole display path is Rust.
Byte-identical pixels from both code paths. The CHW→HWC, f32→u8 conversion at the end of the pipeline is now a single shared helper, decoded_chw_to_rgb8 (pub(crate) in oxibonsai-image/pipeline.rs), called by both text_to_image (the one-shot path) and ImageSession::render (the REPL path). Sharing the conversion guarantees the REPL produces byte-identical pixels to the one-shot command — the resident path is not a second, drifting implementation; it is the same path, kept warm.
Documented GPU flags, and a CUDA parity probe. The .env.example now documents the three image-generation GPU switches: OXI_DIT_ATTN_GPU (DiT flash-attention, default-ON on Apple Silicon), OXI_VAE_GPU (VAE convolutions, default-ON on Apple Silicon), and OXI_TE_GPU (text-encoder GPU — default-OFF, because CPU SIMD wins on Apple Silicon, though it may help on Windows/NVIDIA CUDA). On the kernels side, oxibonsai-kernels gains an isolated cuda_tq2_gemv_parity.rs probe behind cfg(feature = "cuda") for validating TQ2 GEMV output on Blackwell-class GPUs. And oxionnx-proto ticks 0.1.3 → 0.1.4.
Getting Started
Install the CLI:
cargo install oxibonsai-cli
Start a resident image REPL (model paths resolve flag → env → default, exactly like oxibonsai image):
oxibonsai repl --seed 42 --steps 4 --width 512 --height 512
Then iterate. A bare line is a prompt; :-prefixed lines are commands:
oxibonsai> :fast
oxibonsai> a tiny bonsai tree in a ceramic pot
oxibonsai> :seed 7
oxibonsai> a tiny bonsai tree in a ceramic pot
oxibonsai> :hq
oxibonsai> a tiny bonsai tree in a ceramic pot
:fast drops to a snappy 2-step 384×384 preview; once a prompt and seed look right, :hq finalizes at 8 steps and 512×512. On Ghostty each render appears inline; elsewhere it lands in a PNG. The one-shot oxibonsai image --prompt "…" --seed 42 --out bonsai.png is still there for scripts and CI — but the REPL is where 0.2.2 wants you to live while you’re composing.
What’s New in 0.2.2
oxibonsai repl— resident interactive image REPL.ImageSessionloads the DiT, VAE, and text encoder once and renders many prompts without re-paying the load/dequant cost.StageTimingsandRenderOutcomesurface per-stage wall-clock splits. Runtime commands::steps,:seed,:size,:fast(2-step 384×384 preview),:hq(8-step 512×512),:out,:open,:help,:quit.TeWeights::set_resident(on: bool). Controls whether the Mlx4bit source caches dequantised f32 tensors (~16 GB) across forwards. Off by default to preserve the one-shot low-RAM profile; on for the REPL.- Kitty graphics protocol support (
src/cli/term.rs). Pure-Rust base64 encoder plus inline PNG display for Ghostty;kitty_supported()auto-detects viaGHOSTTY_*/TERM/TERM_PROGRAM. - GPU acceleration flags documented in
.env.example.OXI_DIT_ATTN_GPU(flash-attention, default-ON on Apple Silicon),OXI_VAE_GPU(convolutions, default-ON on Apple Silicon),OXI_TE_GPU(text-encoder GPU, default-OFF — CPU SIMD wins on Apple Silicon; may help on Windows/NVIDIA CUDA). - CUDA TQ2 GEMV parity test (
oxibonsai-kernels). Isolatedcuda_tq2_gemv_parity.rsprobe for Blackwell GPU output validation, behindcfg(feature = "cuda"). - Shared
decoded_chw_to_rgb8helper. The CHW→HWC f32→u8 conversion is now shared by bothtext_to_imageandImageSession::render, guaranteeing byte-identical pixels from both paths. oxionnx-protobumped 0.1.3 → 0.1.4.
Tips
- Keep the encoder resident for fast iteration. On a high-RAM machine, the REPL’s resident text encoder (
TeWeights::set_resident, ~16 GB cached) means each prompt after the first skips re-dequantisation. That is exactly the trade you want when you’re cycling through prompts and seeds — RAM in exchange for not re-paying warm-up. - Work the
:fast→:hqloop. Compose with:fast(2 steps, 384×384) to find the prompt and seed cheaply, then:hq(8 steps, 512×512) once to finalize. You spend your steps where they matter and skip slow renders of the wrong image. - Change
:seedto explore, then lock it.:seed Nre-rolls the same prompt; when one lands, keep the seed and switch to:hq. Pairs naturally with 0.2.0’s byte-exact--seed, so the finalized image is reproducible later. - Flip the GPU flags to match your platform. On Apple Silicon,
OXI_DIT_ATTN_GPUandOXI_VAE_GPUare already on andOXI_TE_GPUis off (CPU SIMD wins there). On Windows/NVIDIA CUDA, try turningOXI_TE_GPUon — that’s the one switch most likely to help off the Mac. - Read
StageTimingswhen tuning. EachRenderOutcomereports the per-stage split, so you can see whether a:stepschange hit sampling or a:sizechange hit VAE decode — and tune the stage that’s actually costing you. - Run in Ghostty for the inline workflow. On a kitty-graphics terminal the image appears in place via the pure-Rust protocol — no external viewer, no leaving the shell. Elsewhere it falls back to a file (use
:open onto pop it in a viewer).
This is the foundation
OxiBonsai is the inference end of the COOLJAPAN ecosystem — sub-2-bit Bonsai models from PrismML, served and rendered on top of SciRS2, OxiBLAS, OxiFFT, OxiARC, and OxiONNX, with no FFI and no C/C++/Fortran runtime anywhere underneath. 0.2.2 extends that all the way to the terminal: the model is resident, the iteration loop is interactive, and even the pixels reach your screen through Pure Rust.
Repository: https://github.com/cool-japan/oxibonsai
Star the repo if you believe generating an image should be an interactive loop in your own terminal — fast, reproducible, and sovereign, without a line of C.
Pure Rust sub-2-bit image generation that loads once and draws inline is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ June 8, 2026