OxiBonsai 0.1.2 Released — Import onnx-community Ternary ONNX to GGUF in One Command, No Python

The Hugging Face onnx-community Ternary ONNX releases just became a first-class source for OxiBonsai — one command turns them into a ready-to-run GGUF, and not a line of Python is involved.

Today we released OxiBonsai 0.1.2 — the ONNX ingestion release, where oxibonsai convert --onnx reads onnx-community Ternary ONNX models (MatMulNBits, bits=2) and repacks them directly into OxiBonsai’s native GGUF TQ2_0_g128 format.

OxiBonsai (オキシ盆栽) is the first Pure Rust, zero-FFI inference engine for PrismML’s sub-2-bit Bonsai model family — the 1-bit line (Q1_0_g128) and the ternary line (TQ2_0_g128). No llama.cpp. No BLAS. No C/C++/Fortran runtime. And as of 0.1.2, even the ONNX import path holds that line: it runs on oxionnx-proto, a pure-Rust ONNX protobuf reader — not onnxruntime, not a C library. The same engine that was extended in 0.1.1 with CUDA NVRTC (fused Q1+TQ2, ~21.9 tok/s on Ternary-Bonsai-1.7B on an RTX 3060), fused Metal TQ2 (~50 tok/s, ~13×), ternary CPU SIMD, and the TQ2_0_g128 ternary line now also speaks ONNX end to end.

Why OxiBonsai 0.1.2 matters

Until now, the path into OxiBonsai went through PrismML’s unpacked safetensors: download, then oxibonsai convert --quant tq2_0_g128. That works, but it tied you to one publishing channel. The catch is that a lot of ternary weights ship as ONNX through the community — the onnx-community namespace mirrors these models as ONNX with MatMulNBits quantization baked in.

0.1.2 makes those releases a first-class input. You point the converter at an onnx-community/Ternary-Bonsai-1.7B-ONNX download and get the exact same GGUF you would have produced from safetensors. No Python toolchain, no optimum, no onnxruntime session — just the oxibonsai binary doing the whole job in one shot. Interoperability without giving up sovereignty.

Technical Deep Dive

The ONNX path is built on oxionnx-proto, the OxiONNX ecosystem’s pure-Rust ONNX protobuf crate. It parses the ONNX graph and its initializers in Rust, so the import path stays zero-FFI like every other part of the engine. This release upgrades the oxionnx-proto workspace dependency to 0.1.2.

The key insight is that ONNX MatMulNBits with bits=2 lines up directly with OxiBonsai’s native ternary block format, TQ2_0_g128 — the same on-disk layout the safetensors converter and the runtime already use. So the ONNX import is a repack, not a re-quantization: the quantized blocks are read and rewritten into GGUF without round-tripping through floats and re-quantizing. Bit-for-bit, the ternary weights carry through.

To make that mapping work, 0.1.2 adds a Qwen3 ONNX tensor role mapping. All Bonsai models share the Qwen3 architecture, so the converter needs to know which ONNX initializers correspond to which Qwen3 weights — attention/GQA projections, the SwiGLU MLP, RoPE, RMSNorm, the embeddings, and the lm_head. The role mapping teaches the converter exactly that.

The result: two conversion paths, one output. Whether you come in via safetensors (--quant tq2_0_g128) or via ONNX (--onnx), you land on the same TQ2_0_g128 GGUF, and the runtime, tokenizer, and server treat it identically.

Getting Started

Install the CLI:

cargo install oxibonsai-cli       # installs the `oxibonsai` binary

Convert an onnx-community Ternary ONNX release straight to GGUF:

oxibonsai convert --onnx \
  --from path/to/model.onnx \
  --to models/Ternary-Bonsai-1.7B.gguf

--from accepts an ONNX directory or a single .onnx file, e.g. an onnx-community/Ternary-Bonsai-1.7B-ONNX download.

Then run the converted model:

oxibonsai run --model models/Ternary-Bonsai-1.7B.gguf \
  --prompt "Explain ternary quantization in one sentence."

For comparison, the safetensors path (unchanged, predates 0.1.2) produces the identical GGUF:

oxibonsai convert --from <unpacked-safetensors-dir> --to models/Ternary-Bonsai-1.7B.gguf --quant tq2_0_g128

What’s New in 0.1.2

ONNX MatMulNBits (bits=2) ingestion — oxibonsai convert --onnx reads onnx-community Ternary releases directly and repacks them as GGUF (TQ2_0_g128).
Qwen3 ONNX tensor role mapping for the converter, covering attention/GQA, SwiGLU MLP, RoPE, RMSNorm, embeddings, and lm_head.
Upgraded oxionnx-proto workspace dependency to 0.1.2.
Workspace version bump to 0.1.2 across all nine subcrates and [workspace.dependencies].
Alpha → Stable uplift for oxibonsai-tokenizer, oxibonsai-rag, oxibonsai-eval, and oxibonsai-serve.

Tips

Pick the right path. Use --onnx with --from/--to for the community onnx-community ONNX releases; use --quant tq2_0_g128 for PrismML’s unpacked safetensors. Both yield byte-identical TQ2_0_g128 GGUF, so choose by whichever source you have on hand.
--from is flexible. It takes either a directory from an onnx-community ONNX download or a single .onnx file — no manual unpacking step required.
Any Qwen3 ternary ONNX maps cleanly. Because the Bonsai models all share the Qwen3 architecture, the new Qwen3 ONNX tensor role mapping lets the converter ingest any Qwen3-architecture Ternary ONNX release without per-model wiring.
Depend on the now-Stable crates directly. oxibonsai-tokenizer (pure-Rust HuggingFace tokenizer.json parser), oxibonsai-rag (retrieval), oxibonsai-eval (evaluation), and oxibonsai-serve graduated to Stable in this release and are ready to pull into your own builds.
Serve the converted model. oxibonsai serve --model models/Ternary-Bonsai-1.7B.gguf, powered by the now-Stable serve crate, exposes an OpenAI-compatible REST surface over the GGUF you just produced.

This is the foundation

OxiBonsai sits squarely in the COOLJAPAN ecosystem — SciRS2, OxiBLAS, OxiFFT, OxiARC, and now, prominently, OxiONNX via oxionnx-proto for the ONNX import path — running PrismML’s Bonsai models with no FFI and no foreign runtime anywhere in the stack. With 0.1.2, the on-ramp widens from one publishing channel to two, and the pure-Rust guarantee holds across both.

Repository: https://github.com/cool-japan/oxibonsai

Star the repo if a no-Python, pure-Rust ONNX-to-GGUF pipeline for sub-2-bit models is the kind of interoperability you’ve been waiting for.

Pure Rust sovereign inference now reads ONNX on its own terms — fast, safe, sovereign, and open to the wider ecosystem.

— KitaSan at COOLJAPAN OÜ April 19, 2026