2 posts
OxiBonsai 0.2.0 opens the 0.2 series: a concurrent engine pool that shares one 1.16 GB embedding table across replicas, a CPU↔Metal byte-identical parity guard, a parity-first CUDA imagen backend (~3.2× to ~31.7s on A4000), --seed byte-exact reproducible images, and a stable-toolchain build — sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.
OxiBonsai 0.1.3 makes sub-2-bit serving smarter: a prefix-cache-aware engine that reuses KV-cache across requests with byte-identical cold/warm parity, runtime tokenizer auto-detection, and a GPU weight cache that uploads once. Sub-2-bit Pure Rust sovereign AI inference for the COOLJAPAN ecosystem.