COOLJAPAN

Posts tagged #tokenizer

1 posts

May 3, 2026 · 8 min

OxiBonsai 0.1.3 Released — Prefix-Cache-Aware Serving with Byte-Identical Warm Paths

OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm