COOLJAPAN

Posts tagged #ternary

1 posts

Apr 18, 2026 · 7 min

OxiBonsai 0.1.1 Released — Sub-2-Bit Inference Goes GPU, and the Ternary Line Lands

Five days after its 1-bit debut, OxiBonsai grows GPUs: a native CUDA NVRTC backend (~21.9 tok/s on Ternary-Bonsai-1.7B, RTX 3060) and a fused Metal full-forward path (~50 tok/s, ~13x speedup) — plus the new ternary TQ2_0_g128 quant family, with NEON/AVX2/AVX-512 GEMV so it flies on CPU too. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem, still with no llama.cpp, no BLAS, no C/Fortran.

releaseoxibonsaillm