COOLJAPAN
← All posts

OxiRAG 0.1.1 Released — Real OxiZ SMT Verification, Candle SLM Inference, and up to 9x Faster Similarity

OxiRAG 0.1.1 lands production OxiZ SMT verification for Layer 3, real Candle Phi-2/Phi-3 inference and LoRA training, and 5.6x–9.0x faster cosine similarity — all in a Pure Rust RAG engine backed by 1,472 tests.

release oxirag rag retrieval-augmented-generation rust llm smt vector-search simd

The four-layer Pure Rust RAG engine gets teeth: real formal verification, real small-model inference, and SIMD that screams.

Today we released OxiRAG 0.1.1 — an incremental release that turns three of OxiRAG’s most ambitious layers from scaffolding into production code: OxiZ-backed SMT verification, Candle small-language-model inference, and dramatically faster similarity search.

No C. No Python. No Fortran.

OxiRAG still does the whole RAG stack in Rust — retrieval, draft verification, formal logic checking, and graph traversal — with no interpreter and no FFI. Where Python RAG stacks like LangChain and LlamaIndex orchestrate over FAISS for vectors and PyTorch for inference, OxiRAG keeps all of it in one process and compiles to a single static binary or to WASM. 0.1.1 doesn’t change that promise; it makes the parts behind it real.

Why OxiRAG 0.1.1 is a game changer

0.1.0 proved the four-layer architecture. 0.1.1 makes it land. The Judge layer now runs the genuine OxiZ SMT solver instead of a mock — formal verification of claims is real and timed. The Speculator can now load and run actual Phi-2 and Phi-3 models through Candle. And the Echo layer’s similarity math went from “fine” to “fast,” with measured wins of 5.6x to 9.0x on cosine similarity and 8x on a 5,000-document search. None of these are projections — they’re the numbers in this release, backed by a test suite that grew to 1,472 tests.

Technical Deep Dive: The Four-Layer Pipeline, hardened

Each of OxiRAG’s four layers got sharper in this release.

Layer 1 — Echo, now SIMD-accelerated. A new src/layer1_echo/similarity_simd.rs (568 lines) adds hand-tuned SIMD paths: ARM NEON for Apple Silicon M-series, and AVX + SSE2 on x86_64. The result is a 5.6x–9.0x speedup for cosine similarity and an 8x speedup for a 5,000-document search. The hot paths also got #[inline], and top_k_similar() now uses partial sorting and fewer allocations.

Layer 2 — Speculator, now running real models. A new src/layer2_speculator/candle_slm.rs (843 lines) introduces CandleSLM with Phi-2 and Phi-3 support and genuine inference (~2.7–3.8 GB models). It does device selection across CPU / CUDA / Metal with automatic HuggingFace Hub downloads, and implements the full SmallLanguageModel trait (generate, get_logprobs, verify_text), covered by 13 tests.

Layer 3 — Judge, now backed by OxiZ. The headline change. A new src/layer3_judge/oxiz_verifier.rs wires the OxiZ SMT solver into the Judge with proper timeout handling. It ships 26 comprehensive tests spanning every claim type — predicate, numeric, temporal, causal, and modal — plus batch verification, consistency checking, and 9 solver benchmarks. Formal verification in OxiRAG is now OxiZ doing real proofs.

Distillation — now with real LoRA. A new src/distillation/candle_lora.rs (998 lines) implements honest low-rank A/B matrices for parameter-efficient fine-tuning (0.1–1% of parameters), Kaiming-uniform initialization for A and zeros for B, training-job management, and checkpoint save/load — 15 tests, plus a runnable examples/lora_training_example.rs.

Underneath all of this, confidence grew: property-based testing arrived via proptest 1.10.0 with 60+ property tests covering vector commutativity and normalization, LRU cache eviction correctness, graph shortest-path and BFS, SMT-LIB generation, and query-normalization idempotence. The codebase moved from 91 to 95 Rust files and from 60,001 to 63,885 lines, and the full suite now stands at 1,472 tests.

Getting Started

Install OxiRAG with the layers you want:

cargo add oxirag --features "judge speculator"

Then wire the real OxiZ-backed Judge (and, optionally, a Candle small model) into the pipeline. The new oxiz_verifier module powers Layer 3:

use oxirag::prelude::*;

// Layer 3 (Judge) now backed by the real OxiZ SMT solver.
// `AdvancedClaimExtractor` pulls claims out of natural language,
// and the new oxiz_verifier module formally checks them via OxiZ.
let judge = JudgeImpl::new(
    AdvancedClaimExtractor::new(),
    // OxiZ-backed verifier (set a solver timeout to stay responsive):
    OxiZVerifier::default(),
    JudgeConfig::default(),
);

// Layer 2 (Speculator) can now run a real Phi-2 model via Candle.
// CandleSLM auto-downloads from HuggingFace Hub and picks CPU/CUDA/Metal.
let speculator = CandleSLM::phi2()?;

(Construct the rest of the pipeline with PipelineBuilder exactly as in 0.1.0 — the new pieces drop straight into the same with_judge / with_speculator slots.)

What’s New in 0.1.1

Tips

This is the foundation

OxiRAG 0.1.1 deepens its place in the COOLJAPAN Pure Rust stack. The defining tie remains OxiZ: with this release, Layer 3’s Judge runs the real OxiZ SMT solver, so formal verification of claims is genuine theorem-proving rather than a placeholder. Candle continues to serve as the external Rust ML runtime for the Echo and Speculator layers — a third-party crate doing the embedding and small-model inference. And OxiRAG keeps good company in the surrounding Pure Rust ecosystem — SciRS2, NumRS2, PandRS, OxiArc, OxiFFT — all built on the same conviction that the whole stack can be C-, Python-, and Fortran-free.

Repository: https://github.com/cool-japan/oxirag

Star the repo if real formal verification and real small-model inference belong in your RAG pipeline.

Pure Rust RAG is here — fast, safe, and sovereign.

KitaSan at COOLJAPAN OÜ February 7, 2026

↑ Back to all posts