OxiRAG 0.1.0 Released — A Four-Layer Pure Rust RAG Engine with SMT Verification

Retrieval-Augmented Generation, rebuilt from the ground up in Rust — no interpreter, no FFI, no compromises.

Today we released OxiRAG 0.1.0 — the first real release of a Pure Rust RAG engine that runs retrieval, verification, formal logic checking, and knowledge-graph traversal through one coherent four-layer pipeline.

No C. No Python. No Fortran.

The RAG world today is a tower of borrowed languages. LangChain and LlamaIndex orchestrate in Python. FAISS does vector math in C/C++. Embeddings and LLM serving lean on Python and PyTorch. Every layer is a runtime, a packaging headache, and a dependency you didn’t write. OxiRAG takes the opposite path: it is written end to end in Rust. No C. No Python. No Fortran. The result compiles to a single static binary you can drop on a server — or, because the crate ships as cdylib + rlib, to WebAssembly you can run in the browser or at the edge.

This is an early release, but it is a solid one. The architecture is real, the layers are real, and the whole thing is honestly small enough to read.

Why this matters

The usual RAG pitch stops at “retrieve some chunks and stuff them into a prompt.” That is Layer 1 of OxiRAG, and it is only the beginning. The pain with naive RAG is that retrieval is treated as truth: whatever the vector store returns becomes the answer, hallucinations and all. OxiRAG instead treats retrieval as a draft to be checked, claims as logic to be verified, and entities as a graph to be traversed. Each layer adds a guard the layer below it lacks — and you only pay for the layers you turn on.

Concretely, with 0.1.0 you get semantic search out of the box, a draft-verification stage that can accept, revise, or reject cache hits, optional formal logic verification, and optional knowledge-graph retrieval — all in one Rust process, with SIMD-accelerated similarity and streaming results underneath.

Technical Deep Dive: The Four-Layer Pipeline

OxiRAG’s design is a pipeline of four cooperating layers, each in its own module.

Layer 1 — Echo (src/layer1_echo/). Semantic search via vector embeddings. It supports cosine, euclidean, and dot-product metrics, ships an in-memory vector store, and exposes a pluggable embedding-provider interface — a Candle BERT provider for real work, and a mock provider for fast tests.

Layer 2 — Speculator (src/layer2_speculator/). Draft verification. Instead of trusting cache results, the Speculator treats them as drafts and runs them through an Accept / Revise / Reject pipeline before anything is finalized. It is rule-based by default and can be backed by a small language model (Candle) when you enable the speculator feature.

Layer 3 — Judge (src/layer3_judge/). SMT-based logic verification. The Judge extracts claims from natural language, encodes them into SMT-LIB, and hands them to the OxiZ SMT solver for formal verification — covering temporal, causal, and modal claims, not just simple predicates. This is the layer that turns “sounds right” into “provably consistent.”

Layer 4 — GraphRAG (src/layer4_graph/). Knowledge-graph retrieval. It extracts entities and relationships, stores them in an in-memory graph, and supports BFS traversal, shortest-path, and N-hop queries — plus a hybrid vector + graph search that fuses Layer 1 and Layer 4.

Around these four layers sit the supporting modules that make the pipeline practical: src/prefix_cache/ for context-aware KV-cache management (LRU + TTL with prefix matching), src/distillation/ for on-the-fly teacher-student, progressive, and feature-based model distillation (with Q&A collection and LoRA training-example export), and src/hidden_states/. The top-level source files round it out: pipeline.rs, query_expansion.rs, reranker.rs, streaming.rs, simd_similarity.rs, circuit_breaker.rs, and connection_pool.rs.

Feature flags keep all of this modular. The default is ["native", "echo"] — native async on Tokio plus Layer 1. From there you opt into speculator (Candle SLM), judge (OxiZ SMT), graphrag, prefix-cache, distillation, full, wasm, cuda, or metal. The crate targets Rust edition 2024.

Getting Started

Add OxiRAG to your project:

cargo add oxirag

use oxirag::prelude::*;

#[tokio::main]
async fn main() -> Result<(), OxiRagError> {
    let echo = EchoLayer::new(
        MockEmbeddingProvider::new(384),
        InMemoryVectorStore::new(384),
    );
    let speculator = RuleBasedSpeculator::default();
    let judge = JudgeImpl::new(
        AdvancedClaimExtractor::new(),
        MockSmtVerifier::default(),
        JudgeConfig::default(),
    );

    let mut pipeline = PipelineBuilder::new()
        .with_echo(echo)
        .with_speculator(speculator)
        .with_judge(judge)
        .build()?;

    pipeline.index(Document::new("The capital of France is Paris.")).await?;
    pipeline.index(Document::new("Paris is known for the Eiffel Tower.")).await?;

    let query = Query::new("What is the capital of France?");
    let result = pipeline.process(query).await?;
    println!("Answer: {}", result.final_answer);
    println!("Confidence: {:.2}", result.confidence);
    println!("Layers used: {:?}", result.layers_used);
    Ok(())
}

The mock providers make this compile and run with zero downloads — swap them for Candle BERT and OxiZ when you go to production.

What’s inside

Everything in this first release:

The full four-layer pipeline — Echo (semantic search), Speculator (draft verification), Judge (SMT logic verification), and GraphRAG (knowledge-graph retrieval).
WASM support — the crate ships as cdylib + rlib, so RAG can run in the browser or at the edge.
Native async on Tokio.
SIMD-optimized similarity for fast vector comparison.
Prefix caching with paging and invalidation.
Model distillation — teacher-student, progressive, and feature-based.
Query expansion and reranking to sharpen retrieval.
Streaming results with progress reporting.
Circuit breaker and connection pooling for resilient I/O.
A comprehensive benchmarking suite so the performance story stays honest.

Tips

A few ways to get the most out of OxiRAG from day one:

Start with the echo default and grow from there. Add judge (which pulls in OxiZ), graphrag, or speculator only when you actually need them — keeping the build lean and the binary small.
Use the mocks for testing, Candle + OxiZ for production. MockEmbeddingProvider and MockSmtVerifier make tests fast and deterministic; swap to Candle BERT and the real OxiZ verifier when correctness matters.
Tune the Speculator’s thresholds. Because it treats retrieved cache entries as drafts, you control how strict it is via SpeculatorConfig (accept_threshold / reject_threshold) to balance precision against recall.
Enable prefix-cache to reuse premise knowledge. Context-aware KV-cache reuse across queries means repeated “premise” context doesn’t get recomputed every time.
Build for wasm to ship RAG to the edge. Since the crate is cdylib + rlib, you can compile the same pipeline for the browser or an edge worker.
Compose the graph layer with GraphLayerBuilder when you want hybrid vector + graph retrieval instead of pure semantic search.

This is the foundation

OxiRAG slots into the wider COOLJAPAN Pure Rust stack. Its most important tie is OxiZ, the SMT solver that powers Layer 3 (Judge) — formal claim verification in OxiRAG is OxiZ doing the proving, not a bolt-on. For the machine-learning runtime in Layers 1 and 2, OxiRAG uses Candle, a capable third-party Rust ML crate (BERT embeddings and the small language model live there). And it sits alongside the broader Pure Rust ecosystem — SciRS2, NumRS2, PandRS, OxiArc, OxiFFT — that shares the same goal: a numerical and data stack with no C, Python, or Fortran underneath.

This is 0.1.0. It is early, but the bones are real and the layers work together today.

Repository: https://github.com/cool-japan/oxirag

Star the repo if a sovereign, Pure Rust RAG engine is something you’ve been waiting for.

Pure Rust RAG is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ January 24, 2026