COOLJAPAN

Posts tagged #serving

2 posts

Jun 3, 2026 · 7 min

OxiBonsai 0.2.0 Released — Concurrent /serve, Byte-Identical CPU↔Metal, and Reproducible Images

OxiBonsai 0.2.0 opens the 0.2 series: a concurrent engine pool that shares one 1.16 GB embedding table across replicas, a CPU↔Metal byte-identical parity guard, a parity-first CUDA imagen backend (~3.2× to ~31.7s on A4000), --seed byte-exact reproducible images, and a stable-toolchain build — sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm
May 3, 2026 · 8 min

OxiBonsai 0.1.3 Released — Prefix-Cache-Aware Serving with Byte-Identical Warm Paths

OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm