OxiRAG 0.1.1 Released — Real OxiZ SMT Verification, Candle SLM Inference, and up to 9x Faster Similarity

The four-layer Pure Rust RAG engine gets teeth: real formal verification, real small-model inference, and SIMD that screams.

Today we released OxiRAG 0.1.1 — an incremental release that turns three of OxiRAG’s most ambitious layers from scaffolding into production code: OxiZ-backed SMT verification, Candle small-language-model inference, and dramatically faster similarity search.

No C. No Python. No Fortran.

OxiRAG still does the whole RAG stack in Rust — retrieval, draft verification, formal logic checking, and graph traversal — with no interpreter and no FFI. Where Python RAG stacks like LangChain and LlamaIndex orchestrate over FAISS for vectors and PyTorch for inference, OxiRAG keeps all of it in one process and compiles to a single static binary or to WASM. 0.1.1 doesn’t change that promise; it makes the parts behind it real.

Why OxiRAG 0.1.1 is a game changer

0.1.0 proved the four-layer architecture. 0.1.1 makes it land. The Judge layer now runs the genuine OxiZ SMT solver instead of a mock — formal verification of claims is real and timed. The Speculator can now load and run actual Phi-2 and Phi-3 models through Candle. And the Echo layer’s similarity math went from “fine” to “fast,” with measured wins of 5.6x to 9.0x on cosine similarity and 8x on a 5,000-document search. None of these are projections — they’re the numbers in this release, backed by a test suite that grew to 1,472 tests.

Technical Deep Dive: The Four-Layer Pipeline, hardened

Each of OxiRAG’s four layers got sharper in this release.

Layer 1 — Echo, now SIMD-accelerated. A new src/layer1_echo/similarity_simd.rs (568 lines) adds hand-tuned SIMD paths: ARM NEON for Apple Silicon M-series, and AVX + SSE2 on x86_64. The result is a 5.6x–9.0x speedup for cosine similarity and an 8x speedup for a 5,000-document search. The hot paths also got #[inline], and top_k_similar() now uses partial sorting and fewer allocations.

Layer 2 — Speculator, now running real models. A new src/layer2_speculator/candle_slm.rs (843 lines) introduces CandleSLM with Phi-2 and Phi-3 support and genuine inference (~2.7–3.8 GB models). It does device selection across CPU / CUDA / Metal with automatic HuggingFace Hub downloads, and implements the full SmallLanguageModel trait (generate, get_logprobs, verify_text), covered by 13 tests.

Layer 3 — Judge, now backed by OxiZ. The headline change. A new src/layer3_judge/oxiz_verifier.rs wires the OxiZ SMT solver into the Judge with proper timeout handling. It ships 26 comprehensive tests spanning every claim type — predicate, numeric, temporal, causal, and modal — plus batch verification, consistency checking, and 9 solver benchmarks. Formal verification in OxiRAG is now OxiZ doing real proofs.

Distillation — now with real LoRA. A new src/distillation/candle_lora.rs (998 lines) implements honest low-rank A/B matrices for parameter-efficient fine-tuning (0.1–1% of parameters), Kaiming-uniform initialization for A and zeros for B, training-job management, and checkpoint save/load — 15 tests, plus a runnable examples/lora_training_example.rs.

Underneath all of this, confidence grew: property-based testing arrived via proptest 1.10.0 with 60+ property tests covering vector commutativity and normalization, LRU cache eviction correctness, graph shortest-path and BFS, SMT-LIB generation, and query-normalization idempotence. The codebase moved from 91 to 95 Rust files and from 60,001 to 63,885 lines, and the full suite now stands at 1,472 tests.

Getting Started

Install OxiRAG with the layers you want:

cargo add oxirag --features "judge speculator"

Then wire the real OxiZ-backed Judge (and, optionally, a Candle small model) into the pipeline. The new oxiz_verifier module powers Layer 3:

use oxirag::prelude::*;

// Layer 3 (Judge) now backed by the real OxiZ SMT solver.
// `AdvancedClaimExtractor` pulls claims out of natural language,
// and the new oxiz_verifier module formally checks them via OxiZ.
let judge = JudgeImpl::new(
    AdvancedClaimExtractor::new(),
    // OxiZ-backed verifier (set a solver timeout to stay responsive):
    OxiZVerifier::default(),
    JudgeConfig::default(),
);

// Layer 2 (Speculator) can now run a real Phi-2 model via Candle.
// CandleSLM auto-downloads from HuggingFace Hub and picks CPU/CUDA/Metal.
let speculator = CandleSLM::phi2()?;

(Construct the rest of the pipeline with PipelineBuilder exactly as in 0.1.0 — the new pieces drop straight into the same with_judge / with_speculator slots.)

What’s New in 0.1.1

OxiZ SMT Solver Integration for Layer 3 Judge, with timeout handling — 26 comprehensive tests across all claim types (predicate, numeric, temporal, causal, modal), batch verification, consistency checking, and 9 solver benchmarks. New module: src/layer3_judge/oxiz_verifier.rs.
SIMD performance — ARM NEON (Apple Silicon M-series) plus x86_64 AVX & SSE2, delivering a 5.6x–9.0x speedup for cosine similarity and an 8x speedup for 5,000-document search. New module: src/layer1_echo/similarity_simd.rs (568 lines).
Property-based testing — proptest 1.10.0 with 60+ property tests (vector commutativity/normalization, LRU cache eviction, graph shortest-path/BFS, SMT-LIB generation, query-normalization idempotence).
Candle SLM integration (production) — CandleSLM with Phi-2 and Phi-3, real inference (~2.7–3.8 GB models), CPU/CUDA/Metal device selection with automatic HuggingFace Hub downloads, and the full SmallLanguageModel trait (generate, get_logprobs, verify_text) — 13 tests. New module: src/layer2_speculator/candle_slm.rs (843 lines).
Candle LoRA training — real low-rank A/B matrices, parameter-efficient fine-tuning (0.1–1% of params), Kaiming-uniform init for A / zeros for B, training-job management, and checkpoint save/load — 15 tests. New module: src/distillation/candle_lora.rs (998 lines), with examples/lora_training_example.rs.
Changed — #[inline] on SIMD similarity; top_k_similar() now uses partial sorting; reduced allocations on hot paths. Test suite updated to 1,472 tests; codebase grew 91→95 Rust files and 60,001→63,885 lines.
Fixed — eliminated .unwrap() from production in src/distillation/progressive.rs; conditional timeout thresholds (5s debug / 1s release).

Tips

Build in release to unlock the SIMD wins automatically. The AVX / NEON paths and the partial-sort top_k_similar() shine in release builds — most of the 5.6x–9.0x cosine speedup is yours for free.
Lean on the 8x 5,000-doc search to scale in-memory. That speedup makes the in-memory Echo store viable for genuinely large corpora before you reach for anything external.
Enable speculator to run Phi-2/Phi-3 via CandleSLM. Models auto-download from HuggingFace Hub; pick the CPU, CUDA, or Metal device that fits your hardware.
Set a solver timeout on the OxiZ verifier. The new formal checks are real proofs — bound them (5s debug / 1s release out of the box) to keep the pipeline responsive.
Fine-tune a specialist with Candle LoRA. src/distillation/candle_lora.rs lets you train a small specialist on collected Q&A while touching only 0.1–1% of parameters.
Trust your invariants. The 60+ new proptest cases pin down vector, graph, and cache correctness, so you can build on top without second-guessing the foundations.

This is the foundation

OxiRAG 0.1.1 deepens its place in the COOLJAPAN Pure Rust stack. The defining tie remains OxiZ: with this release, Layer 3’s Judge runs the real OxiZ SMT solver, so formal verification of claims is genuine theorem-proving rather than a placeholder. Candle continues to serve as the external Rust ML runtime for the Echo and Speculator layers — a third-party crate doing the embedding and small-model inference. And OxiRAG keeps good company in the surrounding Pure Rust ecosystem — SciRS2, NumRS2, PandRS, OxiArc, OxiFFT — all built on the same conviction that the whole stack can be C-, Python-, and Fortran-free.

Repository: https://github.com/cool-japan/oxirag

Star the repo if real formal verification and real small-model inference belong in your RAG pipeline.

Pure Rust RAG is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ February 7, 2026