OxiRS 0.2.0 Released — ~10x Faster SPARQL, Full-Text Search, and 1000-Node Clusters

Same triples, ten times the throughput.

Today we released OxiRS 0.2.0 — a major release that delivers a roughly 10x cumulative SPARQL query speedup and adds 26 new functional modules across 15 development rounds, spanning performance, search, clustering, AI, and quality.

No JVM. No Fortran. No native search library bolted on through JNI. OxiRS remains a Rust-native alternative to Apache Jena + Fuseki — and to Juniper-style GraphQL servers — that compiles to a single static binary (or WASM) and brings full-text search, geospatial reasoning, and distributed clustering in-tree rather than dragging in a C++ index or a JVM-hosted Lucene. Everything in 0.2.0 is backward compatible with 0.1.0 and feature-gated, so you adopt the new machinery on your own schedule.

Why OxiRS 0.2.0 is a game changer

The pain with a mature triple store is rarely correctness — it’s the wall you hit when query volume, dataset size, or cluster count grows. Jena scales, but scaling it means JVM heap tuning, an external Lucene/Elasticsearch for text, and bespoke sharding. OxiRS 0.2.0 attacks that wall directly, and the numbers come straight from the release’s own benchmarks:

~10x cumulative query speedup, built from a stack of independently measured wins: 2.5x from intelligent cache invalidation, 1.75x from ML-based cost prediction (with a 14.2x cache speedup), 1.4x from streaming execution, 1.3x from distributed caching, and 1.25x from adaptive re-optimization.
Production-grade full-text search, integrated natively: BM25 ranking with stemming and tokenization, fuzzy matching, phrase queries, and incremental indexing — no separate search cluster.
1000+ node clusters, up from 500, with adaptive batching reported at 15–50x speedup on large clusters, plus LZ4/Zstd/LZMA compression for 40–60% bandwidth reduction and AES-256-GCM encrypted Raft log entries.
3D GeoSPARQL — 26 topological predicates (sfContains3D, sfIntersects3D, sfWithin3D, …) over a 3D coordinate system with R-tree indexing for sub-millisecond spatial queries.

And it’s measured: 0.2.0 adds 74 integration tests and 39+ benchmarks, bringing the suite to 39,468 tests passing with zero warnings across all 26 crates.

Technical Deep Dive: where the speed comes from

The performance story is an optimization stack, each layer earning its multiplier:

Cost prediction — a Random Forest query-cost predictor with 25 feature extractors, reported at 95.4% accuracy on production workloads, retraining automatically on 10K+ query samples. It replaces hand-tuned heuristics with a model that learns your workload.
Cache invalidation — a dependency-tracking engine that invalidates at the granularity of affected predicates for INSERT/DELETE/UPDATE, so writes don’t blow away the whole cache. Backed by 758 invalidation tests.
Streaming execution — memory-bounded operators with automatic spill-to-disk (85% memory savings) and 3-stage pipeline parallelism, so large result sets don’t have to fit in RAM.
Distributed cache — an L1 (process-local, write-through) + L2 (Redis) hierarchy with a coherence protocol using optimistic locking and CAS, so a cluster shares cache state without corrupting it.

On the AI side, 0.2.0 hardens the parts that used to be fragile: LLM provider fallback chains (OpenAI → Anthropic Claude → Ollama) with circuit breakers and token-budget management, and GraphRAG upgraded from Louvain to Leiden community detection for higher-quality partitions, cache-aware with a reported 90% hit rate. This release also added an S3 storage backend (and S3-compatible MinIO/DigitalOcean Spaces) for cloud deployments — and, notably, removed 27,237 lines of unimplemented “vaporware” modules, trimming the codebase to what actually ships and passes tests.

Getting Started

The library is on crates.io as oxirs-core:

cargo add oxirs-core

Or drive a dataset from the shell with the CLI:

# Install the CLI tool
cargo install oxirs

# Initialize a new knowledge graph
oxirs init mykg

# Import RDF data (automatically persisted to mykg/data.nq)
oxirs import mykg data.ttl --format turtle

# Query the data (loaded automatically from disk)
oxirs query mykg "SELECT * WHERE { ?s ?p ?o } LIMIT 10"

# Start the Fuseki-style server
oxirs serve mykg/oxirs.toml --port 3030

Open http://localhost:3030 for the admin UI, or http://localhost:3030/graphql for GraphiQL.

What’s New in 0.2.0

Performance (~10x cumulative) — intelligent cache invalidation (2.5x), ML cost prediction (1.75x), streaming execution (1.4x), distributed L1+L2 cache (1.3x), adaptive re-optimization (1.25x).
Tantivy full-text search — BM25, stemming, fuzzy and phrase queries, incremental indexing with automatic commit batching.
3D GeoSPARQL — 26 topological predicates, 3D coordinates with elevation, R-tree indexing; 505 tests.
Multimodal fusion — hybrid search over text, vector, and spatial results via Reciprocal Rank Fusion.
1000+ node clusters — adaptive batching, pipelined replication, chaos-tested node failures, compression and AES-256-GCM encryption, cross-region locality-aware routing.
AI hardening — LLM fallback chains with circuit breakers; GraphRAG with Leiden community detection; physics RDF integration with a SAMM Aspect Model parser and PROV-O provenance.
Cloud integration — S3 backend (incl. MinIO / DigitalOcean Spaces) and Excel export.
Advanced SPARQL algebra — EXISTS/MINUS evaluators, subquery builder, service clause, LATERAL join.
Configurable RDFS rules — builder-pattern reasoning configuration.
Fixes — Turtle parsing delegated to oxttl for full syntax coverage, SHACL language-tag handling corrected, and the RETE engine’s remove_fact switched to unification matching.
Cleanup — 27,237 lines of unimplemented modules removed; ~18,200 lines of new production code added.

Tips

Adopt the speedups incrementally. Every performance layer is feature-gated for gradual rollout. Turn on cache invalidation and ML cost prediction first — they carry the largest multipliers — and validate on your own workload before enabling the distributed L1+L2 cache.
Drop your external search cluster. With Tantivy in-tree you get BM25, fuzzy, and phrase queries against your RDF directly; one fewer service to run, index, and keep in sync.
Fuse modalities for retrieval. When you have text, vector, and spatial signals, let the new Reciprocal Rank Fusion merge them instead of hand-weighting — it’s what GraphRAG uses internally for hybrid search.
Make LLM calls fault-tolerant. Configure the fallback chain (OpenAI → Claude → Ollama) so a provider outage degrades to a local model instead of failing the request; the circuit breaker and token budget keep you inside cost limits.
Going past ~500 nodes? Enable adaptive batching and payload compression on the cluster — that’s where the 15–50x batching speedup and 40–60% bandwidth savings show up, and AES-256-GCM secures the Raft log on the wire.

This is the foundation

OxiRS 0.2.0 sits on a Pure Rust base from the COOLJAPAN ecosystem. Its numerics run on SciRS2 (the NumPy/SciPy-class stack), binary serialization uses Oxicode instead of bincode, and compression/archiving now goes through OxiARC (oxiarc-archive, oxiarc-zstd, oxiarc-lz4) rather than C zlib/zstd bindings — keeping the default build free of C and Fortran. These are the actual dependencies this release pulls in, not aspirations: the whole stack compiles to one static binary with no external runtime.

Repository: https://github.com/cool-japan/oxirs

Star the repo if a JVM-free knowledge graph that does its own full-text search and scales past a thousand nodes is what your stack has been missing.

Pure Rust Semantic Web is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ March 8, 2026