COOLJAPAN
← All posts

OxiBonsai 0.2.2 Released — An Interactive Image REPL with Inline Terminal Rendering

OxiBonsai 0.2.2 adds `oxibonsai repl`: a resident ImageSession that loads the DiT, VAE, and text encoder once and iterates on prompts without re-paying the load/dequant cost — with images shown inline in Ghostty via a pure-Rust kitty graphics protocol, a `:fast`/`:hq` preview→finalize loop, and documented per-platform GPU flags. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

release oxibonsai llm inference pure-rust quantization text-to-image repl terminal-graphics metal cuda

Generating one image meant reloading multi-gigabyte weights from scratch. Today you load them once, type prompts, and watch the pictures appear right in your terminal.

Today we released OxiBonsai 0.2.2 — an interactive image REPL that keeps the model resident across renders and draws straight into the terminal.

No llama.cpp. No BLAS. No C, no C++, no Fortran runtime. OxiBonsai is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit line (Q1_0_g128) and the ternary line (TQ2_0_g128) — running on CPU SIMD (AVX2/AVX-512/NEON/WASM), Apple Silicon (Metal), and NVIDIA (CUDA NVRTC), all on top of SciRS2, OxiBLAS, OxiFFT, OxiARC, and OxiONNX. Even the terminal graphics encoder in this release is Pure Rust.

Why OxiBonsai 0.2.2 matters

The previous release, 0.2.1, was a quiet DX-and-hardening pass: optimized test/dev profiles and two fixes from the field — a VAE .safetensors-file path bug (#9) and clearer HuggingFace asset-path docs (#8). It made the imaging path smoother to live with. 0.2.2 changes how you actually use it.

Until now, oxibonsai image was a one-shot command: every prompt paid the full cold-start price. Loading the ternary DiT, the VAE, and the Qwen3-4B text encoder — then dequantising the encoder’s weights — is the expensive part of generating an image. Doing it once is fine. Doing it on every prompt while you’re dialing in a composition is a tax on iteration: you sit through a multi-gigabyte warm-up to find out a seed was wrong, fix the seed, and pay the warm-up again.

oxibonsai repl removes that tax. It loads everything once into a resident ImageSession, then renders as many prompts as you like against the already-warm weights. Per-prompt time collapses to just compute — text-encode, DiT sampling, VAE decode — because the load and the encoder warm-up are behind you. And on a kitty-graphics terminal like Ghostty, you never leave the shell to look at the result: the PNG is painted inline, in place.

Technical Deep Dive

ImageSession — load once, render many. The new session (oxibonsai-image, driven from oxibonsai-cli) owns the DiT, the VAE, and the text encoder for its whole lifetime. The first render pays the cold cost; every render after that reuses the resident weights. Each render returns a RenderOutcome carrying StageTimings, so you get the per-stage wall-clock split — how long text-encode, sampling, and VAE decode each took — instead of one opaque total. When you’re tuning, that breakdown tells you which stage your :steps or :size change actually moved.

Resident text encoder — TeWeights::set_resident. The text encoder is the heaviest warm-up: its dequantised f32 weights are roughly 16 GB. The one-shot CLI deliberately keeps those off the heap between forwards to preserve a low-RAM profile, re-dequantising as needed. The REPL is the opposite trade: TeWeights::set_resident(true) tells the Mlx4bit source to cache the dequantised f32 tensors across forwards, so each subsequent prompt skips re-dequantisation entirely. It is off by default — the one-shot path stays lean — and on for the REPL, where you’ve opted into trading RAM for iteration speed.

Inline images via a pure-Rust kitty graphics protocol. A new src/cli/term.rs implements the kitty graphics protocol — including a pure-Rust base64 encoder — to transmit a PNG straight into the terminal’s scrollback. kitty_supported() auto-detects a capable terminal via GHOSTTY_*, TERM, and TERM_PROGRAM. On Ghostty the rendered image shows up inline; on terminals without graphics support, the PNG falls back cleanly to a file (optionally opened in a viewer). No external image viewer, no ImageMagick, no C — the whole display path is Rust.

Byte-identical pixels from both code paths. The CHW→HWC, f32→u8 conversion at the end of the pipeline is now a single shared helper, decoded_chw_to_rgb8 (pub(crate) in oxibonsai-image/pipeline.rs), called by both text_to_image (the one-shot path) and ImageSession::render (the REPL path). Sharing the conversion guarantees the REPL produces byte-identical pixels to the one-shot command — the resident path is not a second, drifting implementation; it is the same path, kept warm.

Documented GPU flags, and a CUDA parity probe. The .env.example now documents the three image-generation GPU switches: OXI_DIT_ATTN_GPU (DiT flash-attention, default-ON on Apple Silicon), OXI_VAE_GPU (VAE convolutions, default-ON on Apple Silicon), and OXI_TE_GPU (text-encoder GPU — default-OFF, because CPU SIMD wins on Apple Silicon, though it may help on Windows/NVIDIA CUDA). On the kernels side, oxibonsai-kernels gains an isolated cuda_tq2_gemv_parity.rs probe behind cfg(feature = "cuda") for validating TQ2 GEMV output on Blackwell-class GPUs. And oxionnx-proto ticks 0.1.3 → 0.1.4.

Getting Started

Install the CLI:

cargo install oxibonsai-cli

Start a resident image REPL (model paths resolve flag → env → default, exactly like oxibonsai image):

oxibonsai repl --seed 42 --steps 4 --width 512 --height 512

Then iterate. A bare line is a prompt; :-prefixed lines are commands:

oxibonsai> :fast
oxibonsai> a tiny bonsai tree in a ceramic pot
oxibonsai> :seed 7
oxibonsai> a tiny bonsai tree in a ceramic pot
oxibonsai> :hq
oxibonsai> a tiny bonsai tree in a ceramic pot

:fast drops to a snappy 2-step 384×384 preview; once a prompt and seed look right, :hq finalizes at 8 steps and 512×512. On Ghostty each render appears inline; elsewhere it lands in a PNG. The one-shot oxibonsai image --prompt "…" --seed 42 --out bonsai.png is still there for scripts and CI — but the REPL is where 0.2.2 wants you to live while you’re composing.

What’s New in 0.2.2

Tips

This is the foundation

OxiBonsai is the inference end of the COOLJAPAN ecosystem — sub-2-bit Bonsai models from PrismML, served and rendered on top of SciRS2, OxiBLAS, OxiFFT, OxiARC, and OxiONNX, with no FFI and no C/C++/Fortran runtime anywhere underneath. 0.2.2 extends that all the way to the terminal: the model is resident, the iteration loop is interactive, and even the pixels reach your screen through Pure Rust.

Repository: https://github.com/cool-japan/oxibonsai

Star the repo if you believe generating an image should be an interactive loop in your own terminal — fast, reproducible, and sovereign, without a line of C.

Pure Rust sub-2-bit image generation that loads once and draws inline is here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ June 8, 2026

↑ Back to all posts