COOLJAPAN
← All posts

OxiBonsai 0.1.2 Released — Import onnx-community Ternary ONNX to GGUF in One Command, No Python

OxiBonsai 0.1.2 adds ONNX ingestion: pull an onnx-community Ternary ONNX release (MatMulNBits, bits=2) and repack it straight to OxiBonsai's GGUF TQ2_0_g128 with a single command — driven by the pure-Rust oxionnx-proto reader, no Python and no onnxruntime. Sub-2-bit sovereign AI inference for the COOLJAPAN ecosystem.

release oxibonsai llm inference pure-rust quantization onnx interoperability gguf

The Hugging Face onnx-community Ternary ONNX releases just became a first-class source for OxiBonsai — one command turns them into a ready-to-run GGUF, and not a line of Python is involved.

Today we released OxiBonsai 0.1.2 — the ONNX ingestion release, where oxibonsai convert --onnx reads onnx-community Ternary ONNX models (MatMulNBits, bits=2) and repacks them directly into OxiBonsai’s native GGUF TQ2_0_g128 format.

OxiBonsai (オキシ盆栽) is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit line (Q1_0_g128) and the ternary line (TQ2_0_g128). No llama.cpp. No BLAS. No C/C++/Fortran runtime. And as of 0.1.2, even the ONNX import path holds that line: it runs on oxionnx-proto, a pure-Rust ONNX protobuf reader — not onnxruntime, not a C library. The same engine that was extended in 0.1.1 with CUDA NVRTC (fused Q1+TQ2, ~21.9 tok/s on Ternary-Bonsai-1.7B on an RTX 3060), fused Metal TQ2 (~50 tok/s, ~13×), ternary CPU SIMD, and the TQ2_0_g128 ternary line now also speaks ONNX end to end.

Why OxiBonsai 0.1.2 matters

Until now, the path into OxiBonsai went through PrismML’s unpacked safetensors: download, then oxibonsai convert --quant tq2_0_g128. That works, but it tied you to one publishing channel. The catch is that a lot of ternary weights ship as ONNX through the community — the onnx-community namespace mirrors these models as ONNX with MatMulNBits quantization baked in.

0.1.2 makes those releases a first-class input. You point the converter at an onnx-community/Ternary-Bonsai-1.7B-ONNX download and get the exact same GGUF you would have produced from safetensors. No Python toolchain, no optimum, no onnxruntime session — just the oxibonsai binary doing the whole job in one shot. Interoperability without giving up sovereignty.

Technical Deep Dive

The ONNX path is built on oxionnx-proto, the OxiONNX ecosystem’s pure-Rust ONNX protobuf crate. It parses the ONNX graph and its initializers in Rust, so the import path stays zero-FFI like every other part of the engine. This release upgrades the oxionnx-proto workspace dependency to 0.1.2.

The key insight is that ONNX MatMulNBits with bits=2 lines up directly with OxiBonsai’s native ternary block format, TQ2_0_g128 — the same on-disk layout the safetensors converter and the runtime already use. So the ONNX import is a repack, not a re-quantization: the quantized blocks are read and rewritten into GGUF without round-tripping through floats and re-quantizing. Bit-for-bit, the ternary weights carry through.

To make that mapping work, 0.1.2 adds a Qwen3 ONNX tensor role mapping. All Bonsai models share the Qwen3 architecture, so the converter needs to know which ONNX initializers correspond to which Qwen3 weights — attention/GQA projections, the SwiGLU MLP, RoPE, RMSNorm, the embeddings, and the lm_head. The role mapping teaches the converter exactly that.

The result: two conversion paths, one output. Whether you come in via safetensors (--quant tq2_0_g128) or via ONNX (--onnx), you land on the same TQ2_0_g128 GGUF, and the runtime, tokenizer, and server treat it identically.

Getting Started

Install the CLI:

cargo install oxibonsai-cli       # installs the `oxibonsai` binary

Convert an onnx-community Ternary ONNX release straight to GGUF:

oxibonsai convert --onnx \
  --from path/to/model.onnx \
  --to models/Ternary-Bonsai-1.7B.gguf

--from accepts an ONNX directory or a single .onnx file, e.g. an onnx-community/Ternary-Bonsai-1.7B-ONNX download.

Then run the converted model:

oxibonsai run --model models/Ternary-Bonsai-1.7B.gguf \
  --prompt "Explain ternary quantization in one sentence."

For comparison, the safetensors path (unchanged, predates 0.1.2) produces the identical GGUF:

oxibonsai convert --from <unpacked-safetensors-dir> --to models/Ternary-Bonsai-1.7B.gguf --quant tq2_0_g128

What’s New in 0.1.2

Tips

This is the foundation

OxiBonsai sits squarely in the COOLJAPAN ecosystem — SciRS2, OxiBLAS, OxiFFT, OxiARC, and now, prominently, OxiONNX via oxionnx-proto for the ONNX import path — running PrismML’s Bonsai models with no FFI and no foreign runtime anywhere in the stack. With 0.1.2, the on-ramp widens from one publishing channel to two, and the pure-Rust guarantee holds across both.

Repository: https://github.com/cool-japan/oxibonsai

Star the repo if a no-Python, pure-Rust ONNX-to-GGUF pipeline for sub-2-bit models is the kind of interoperability you’ve been waiting for.

Pure Rust sovereign inference now reads ONNX on its own terms — fast, safe, sovereign, and open to the wider ecosystem.

KitaSan at COOLJAPAN OÜ April 19, 2026

↑ Back to all posts