COOLJAPAN

Posts tagged #fp8

1 posts

May 16, 2026 · 9 min

OxiBonsai 0.1.4 Released — Production-Grade Sovereign Serving: Self-Tuning Runtime, Prometheus + X-Request-ID Observability, FP8 & K-Quant, and Grammar-Constrained Output

OxiBonsai 0.1.4 makes Pure Rust sub-2-bit inference production-grade for serving: adaptive KV-cache compression and adaptive speculative decoding that self-tune under load, full Prometheus observability with per-request X-Request-ID tracing, new FP8 and K-quant GGUF model support, and grammar-constrained decoding for guaranteed-valid JSON — sovereign AI inference for the COOLJAPAN ecosystem.

releaseoxibonsaillm