7 posts
OxiBonsai 0.2.2 adds `oxibonsai repl`: a resident ImageSession that loads the DiT, VAE, and text encoder once and iterates on prompts without re-paying the load/dequant cost — with images shown inline in Ghostty via a pure-Rust kitty graphics protocol, a `:fast`/`:hq` preview→finalize loop, and documented per-platform GPU flags. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
OxiBonsai 0.2.0 opens the 0.2 series: a concurrent engine pool that shares one 1.16 GB embedding table across replicas, a CPU↔Metal byte-identical parity guard, a parity-first CUDA imagen backend (~3.2× to ~31.7s on A4000), --seed byte-exact reproducible images, and a stable-toolchain build — sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
OxiBonsai 0.1.5 adds the oxibonsai-image crate — the first pure-Rust, zero-FFI, C/C++/Fortran-free FLUX.2-Klein text-to-image pipeline (DiT + VAE + Qwen3-4B text encoder), parity-validated at cos≥0.999, with Metal flash-attention and ~52–62s end-to-end on an M3. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, now spanning text and image.
OxiGDAL 0.1.5 is a focused fix release: a stray padding field in the RayMarchUniforms WGSL layout was shifting every field by 4 bytes, making the GPU compute kernel read a billion-step max and hang indefinitely on macOS Metal. With the layout corrected, the ray-march GPU/CPU parity test passes in 0.127s. 78 Pure Rust workspace crates, 14,605 passing tests.
OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
Five days after its 1-bit debut, OxiBonsai grows GPUs: a native CUDA NVRTC backend (~21.9 tok/s on Ternary-Bonsai-1.7B, RTX 3060) and a fused Metal full-forward path (~50 tok/s, ~13x speedup) — plus the new ternary TQ2_0_g128 quant family, with NEON/AVX2/AVX-512 GEMV so it flies on CPU too. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, still with no llama.cpp, no BLAS, no C/Fortran.
Complete, type-safe, memory-safe rewrite of the entire NVIDIA CUDA Toolkit in pure Rust. cuBLAS/cuDNN/cuFFT/cuSPARSE/cuSOLVER/cuRAND and more — all in 253k SLoC across 28 crates. Only runtime dependency is the NVIDIA driver. PTX codegen + autotuner, 7 GPU backends (Metal/Vulkan/WebGPU/ROCm/LevelZero). ≥90–95% of native CUDA performance. The sovereign GPU computing layer for SciRS2 and the entire COOLJAPAN ecosystem (now 21M+ SLoC total).