The Hugging Face onnx-community Ternary ONNX releases just became a first-class source for OxiBonsai — one command turns them into a ready-to-run GGUF, and not a line of Python is involved.
Today we released OxiBonsai 0.1.2 — the ONNX ingestion release, where oxibonsai convert --onnx reads onnx-community Ternary ONNX models (MatMulNBits, bits=2) and repacks them directly into OxiBonsai’s native GGUF TQ2_0_g128 format.
OxiBonsai (オキシ盆栽) is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit line (Q1_0_g128) and the ternary line (TQ2_0_g128). No llama.cpp. No BLAS. No C/C++/Fortran runtime. And as of 0.1.2, even the ONNX import path holds that line: it runs on oxionnx-proto, a pure-Rust ONNX protobuf reader — not onnxruntime, not a C library. The same engine that was extended in 0.1.1 with CUDA NVRTC (fused Q1+TQ2, ~21.9 tok/s on Ternary-Bonsai-1.7B on an RTX 3060), fused Metal TQ2 (~50 tok/s, ~13×), ternary CPU SIMD, and the TQ2_0_g128 ternary line now also speaks ONNX end to end.
Why OxiBonsai 0.1.2 matters
Until now, the path into OxiBonsai went through PrismML’s unpacked safetensors: download, then oxibonsai convert --quant tq2_0_g128. That works, but it tied you to one publishing channel. The catch is that a lot of ternary weights ship as ONNX through the community — the onnx-community namespace mirrors these models as ONNX with MatMulNBits quantization baked in.
0.1.2 makes those releases a first-class input. You point the converter at an onnx-community/Ternary-Bonsai-1.7B-ONNX download and get the exact same GGUF you would have produced from safetensors. No Python toolchain, no optimum, no onnxruntime session — just the oxibonsai binary doing the whole job in one shot. Interoperability without giving up sovereignty.
Technical Deep Dive
The ONNX path is built on oxionnx-proto, the OxiONNX ecosystem’s pure-Rust ONNX protobuf crate. It parses the ONNX graph and its initializers in Rust, so the import path stays zero-FFI like every other part of the engine. This release upgrades the oxionnx-proto workspace dependency to 0.1.2.
The key insight is that ONNX MatMulNBits with bits=2 lines up directly with OxiBonsai’s native ternary block format, TQ2_0_g128 — the same on-disk layout the safetensors converter and the runtime already use. So the ONNX import is a repack, not a re-quantization: the quantized blocks are read and rewritten into GGUF without round-tripping through floats and re-quantizing. Bit-for-bit, the ternary weights carry through.
To make that mapping work, 0.1.2 adds a Qwen3 ONNX tensor role mapping. All Bonsai models share the Qwen3 architecture, so the converter needs to know which ONNX initializers correspond to which Qwen3 weights — attention/GQA projections, the SwiGLU MLP, RoPE, RMSNorm, the embeddings, and the lm_head. The role mapping teaches the converter exactly that.
The result: two conversion paths, one output. Whether you come in via safetensors (--quant tq2_0_g128) or via ONNX (--onnx), you land on the same TQ2_0_g128 GGUF, and the runtime, tokenizer, and server treat it identically.
Getting Started
Install the CLI:
cargo install oxibonsai-cli # installs the `oxibonsai` binary
Convert an onnx-community Ternary ONNX release straight to GGUF:
oxibonsai convert --onnx \
--from path/to/model.onnx \
--to models/Ternary-Bonsai-1.7B.gguf
--from accepts an ONNX directory or a single .onnx file, e.g. an onnx-community/Ternary-Bonsai-1.7B-ONNX download.
Then run the converted model:
oxibonsai run --model models/Ternary-Bonsai-1.7B.gguf \
--prompt "Explain ternary quantization in one sentence."
For comparison, the safetensors path (unchanged, predates 0.1.2) produces the identical GGUF:
oxibonsai convert --from <unpacked-safetensors-dir> --to models/Ternary-Bonsai-1.7B.gguf --quant tq2_0_g128
What’s New in 0.1.2
- ONNX MatMulNBits (bits=2) ingestion —
oxibonsai convert --onnxreads onnx-community Ternary releases directly and repacks them as GGUF (TQ2_0_g128). - Qwen3 ONNX tensor role mapping for the converter, covering attention/GQA, SwiGLU MLP, RoPE, RMSNorm, embeddings, and lm_head.
- Upgraded
oxionnx-protoworkspace dependency to 0.1.2. - Workspace version bump to 0.1.2 across all nine subcrates and
[workspace.dependencies]. - Alpha → Stable uplift for
oxibonsai-tokenizer,oxibonsai-rag,oxibonsai-eval, andoxibonsai-serve.
Tips
- Pick the right path. Use
--onnxwith--from/--tofor the communityonnx-communityONNX releases; use--quant tq2_0_g128for PrismML’s unpacked safetensors. Both yield byte-identicalTQ2_0_g128GGUF, so choose by whichever source you have on hand. --fromis flexible. It takes either a directory from anonnx-communityONNX download or a single.onnxfile — no manual unpacking step required.- Any Qwen3 ternary ONNX maps cleanly. Because the Bonsai models all share the Qwen3 architecture, the new Qwen3 ONNX tensor role mapping lets the converter ingest any Qwen3-architecture Ternary ONNX release without per-model wiring.
- Depend on the now-Stable crates directly.
oxibonsai-tokenizer(pure-Rust HuggingFacetokenizer.jsonparser),oxibonsai-rag(retrieval),oxibonsai-eval(evaluation), andoxibonsai-servegraduated to Stable in this release and are ready to pull into your own builds. - Serve the converted model.
oxibonsai serve --model models/Ternary-Bonsai-1.7B.gguf, powered by the now-Stable serve crate, exposes an OpenAI-compatible REST surface over the GGUF you just produced.
This is the foundation
OxiBonsai sits squarely in the COOLJAPAN ecosystem — SciRS2, OxiBLAS, OxiFFT, OxiARC, and now, prominently, OxiONNX via oxionnx-proto for the ONNX import path — running PrismML’s Bonsai models with no FFI and no foreign runtime anywhere in the stack. With 0.1.2, the on-ramp widens from one publishing channel to two, and the pure-Rust guarantee holds across both.
Repository: https://github.com/cool-japan/oxibonsai
Star the repo if a no-Python, pure-Rust ONNX-to-GGUF pipeline for sub-2-bit models is the kind of interoperability you’ve been waiting for.
Pure Rust sovereign inference now reads ONNX on its own terms — fast, safe, sovereign, and open to the wider ecosystem.
— KitaSan at COOLJAPAN OÜ April 19, 2026