AmateRS 0.2.2 Released — Horizontal Sharding Goes Live, with OpenTelemetry Tracing and an io_uring WAL

The “patch” that quietly turned AmateRS into a horizontally-sharded, fully-observable distributed database.

Today we released AmateRS 0.2.2 — a patch on paper, but the largest real changelog of the 0.2.x line: horizontal sharding is now live in the public API, distributed traces flow across nodes via OpenTelemetry, and the write path can ride a kernel-bypass io_uring WAL on Linux.

No C. No Fortran. No plaintext leaving your control. While the FHE incumbents lean on TFHE-rs, Microsoft SEAL, and OpenFHE — and the database incumbents still assume a server that can read your data — AmateRS keeps computing in the dark, inside a single static binary, 100% Pure Rust, Apache-2.0. Like Amaterasu retreating into the heavenly rock cave while her light still shines, your data stays inside its cryptographic shell while the queries keep running. Serialization rides oxicode (no bincode), and the new typed cluster log uses postcard for compact, no_std-friendly ClusterCommand encoding on the Raft path.

Why AmateRS 0.2.2 is a game changer

This release crosses the line from “single-node FHE database with a consensus foundation” to “distributed FHE database you can actually shard.”

Horizontal sharding is now in the public API. shard.rs and partitioner.rs (~1,805 lines that previously compiled but were never declared in lib.rs) are now public, bringing consistent-hashing, range, and hash partitioning, plus a QueryRouter to fan requests out and a k-way ResultMerger to fan them back in.
A placement scheduler that thinks for itself. A stateless PlacementCoordinator turns a ShardRegistry snapshot into a deterministic PlacementPlan — detecting hot shards to split, cold adjacent shards to merge, and imbalance to rebalance — and a background PlacementScheduler proposes those PlacementActions as Raft log entries from the leader.
End-to-end distributed tracing. With the telemetry feature on, AmateRS exports OTLP over gRPC and propagates W3C TraceContext through gRPC metadata, so a single trace follows a request across nodes.
A kernel-bypass io_uring WAL. On Linux, the io-uring feature swaps in UringWalWriter for a high-throughput write-ahead-log path on a dedicated tokio_uring runtime.
An FHE circuit cache. Repeated identical predicates no longer recompile — a blake3-keyed LRU CircuitCache short-circuits compilation at the FILTER and UPDATE sites.
Constant-time API-key auth. Raw-key validation is now a constant-time scan, closing a timing side-channel oracle.

All of this lands green: 2,224 tests run, 2,224 passed, 29 skipped, 0 failed across the workspace — including 10 chaos scenarios in the cluster crate and 116 pytest cases in the Python SDK.

Technical Deep Dive

(a) Sharding & placement (Ukehi)

The sharding machinery was already written — it just wasn’t switched on. In 0.2.2, shard.rs and partitioner.rs become part of the public surface, exposing consistent-hashing / range / hash partitioning, the QueryRouter, and a k-way ResultMerger. A ShardRegistry tracks the topology, and the shard lifecycle is modeled explicitly as ShardSplit, ShardMerge, and ShardTransfer.

Driving that lifecycle is placement.rs: a stateless PlacementCoordinator that consumes a ShardRegistry snapshot and emits a deterministic PlacementPlan — split detection for hot shards (is_hot), merge detection for cold adjacent shards on the same node, and imbalance detection that proposes rebalance transfers. It is a pure function: no I/O, fully testable, fully reproducible.

The active half lives in placement_scheduler.rs. The background async PlacementScheduler runs on the Raft leader, takes the plan’s PlacementActions, and proposes them as ClusterCommand entries through Raft. Its PlacementSchedulerHandle stops the loop on drop, and you wire it in with attach_placement_scheduler().

That cluster log is now properly typed. cluster_command.rs defines ClusterCommand with 7 variants — DataPut, DataDelete, PlaceSplit, PlaceMerge, PlaceTransfer, MembershipAdd, MembershipRemove — serialized with postcard (compact, no_std-compatible), replacing the previous raw byte encoding. And because shard transfers can move a lot of state, large snapshots now stream in configurable chunks (snapshot_chunk_threshold_bytes, snapshot_chunk_size_bytes) via SnapshotStreamer / SnapshotReceiver, with per-follower streamers tracked and auto-cleaned; small snapshots still ship single-shot.

(b) Observability: OpenTelemetry

Tracing is now first-class. Behind the telemetry feature in amaters-core, telemetry.rs gives you TelemetryConfig and a TelemetryGuard: an OTLP gRPC exporter (opentelemetry-otlp), a batch SdkTracerProvider, and tracing-subscriber integration. The guard calls shutdown() on drop so in-flight spans are flushed cleanly.

To make traces span more than one node, amaters-net adds W3C TraceContext propagation for gRPC (otel_propagator.rs, also behind telemetry). TraceparentExtractor reads traceparent / tracestate from incoming gRPC metadata, inject_trace_context writes them onto outgoing calls, and TraceContextPropagatorLayer — a Tower Layer — wires it into the stack so you get end-to-end distributed traces across the cluster.

(c) Storage & compute perf

On Linux, the WAL can now bypass the kernel buffer cache. wal_uring.rs (feature io-uring in amaters-core) introduces UringWalWriter, which runs tokio_uring I/O on a dedicated OS thread with its own tokio_uring::start runtime and bridges to the async caller over an mpsc channel. UringWalConfig exposes ring_size, batch_size, direct_io, and channel_capacity, and the handle is Send + Sync + Clone so it can be shared without a mutex. Crucially, tokio-uring is now a cfg(target_os = "linux") conditional dependency, so macOS and Windows builds compile fine.

Secondary indexes also got smarter. An IndexExtractor trait plus an IndexedField type derive secondary-index entries straight from (Key, CipherBlob) — without parsing the ciphertext — and IndexManager::apply_extracted applies batched, diff-based updates. Both LsmTreeStorage and MemoryStorage auto-maintain those indexes via builder methods with_index_manager / with_index_extractor / register_index, so put, delete, and atomic_update now keep indexes consistent transparently under the update_lock — with zero overhead when no manager is attached.

For compute, the new FHE CircuitCache (circuit_cache.rs in amaters-net) is a thread-safe LRU (HashMap + VecDeque + parking_lot::Mutex, clonable Arc) for compiled FHE circuits, keyed by the blake3 hash of the predicate, defaulting to 256 entries. The FILTER and UPDATE predicate sites in server.rs call circuit_cache.get_or_compile(), so repeated same-predicate requests skip PredicateCompiler::compile entirely.

(d) Operability & safety

API-key validation no longer leaks timing. In auth.rs and middleware.rs, the old HashMap lookup on raw key strings is replaced with a constant-time linear scan via the constant_time_eq crate, killing the timing side-channel oracle on stored key characters. (The hashed path, hash_keys=true, was never affected.)

Operations get an alerting brain: alert_rules.rs adds a RuleEngine that evaluates AlertRules against AlertEvents, classifies them as AlertSeverity::Info / Warning / Critical, dedups within a window (rule name + dedup key), and fans FiredAlerts out to AlertSinks. An AlertSink trait plus a LogSink (emitting via tracing) ship in the box; each FiredAlert carries its rule_name, severity, event, and dedup_key.

Schema evolution gets a framework, too. In amaters-server, migration.rs adds a MigrationRegistry where you register version→version Migration steps; plan(from, to) runs a BFS shortest-path search to produce a MigrationPlan of zero-copy step references. The Migration trait exposes from_version / to_version / description / migrate(&mut MigrationContext), and MigrationContext wraps a serde_json::Value document (get / set / remove / into_doc) threaded through the steps so they compose without copying.

And for day-to-day debugging, the CLI gains an explain command in the REPL: it prints the QueryPlanner logical and physical plans locally, with no server round-trip, for get / set / delete / range, using amaters_core::compute::QueryPlanner to render a tree-formatted LogicalPlan + PhysicalPlan with cost estimates.

Getting Started

Add the crate (or pin the SDK explicitly) and start a server:

# Library / facade meta-crate
cargo add amaters
# or pin the Rust SDK directly:
#   amaters-sdk-rust = "0.2.2"

# Start a node
cargo run --bin amaters-server -- start --data-dir ./data

Then poke at the new local query debugger inside the REPL:

cargo run --bin amaters-cli -- repl
# inside the REPL:
explain get my_key      # prints the logical + physical query plan locally (no server round-trip)

To turn on distributed tracing, build with the telemetry feature and point the OTLP exporter at your collector; on Linux, enable the io-uring feature for the kernel-bypass WAL path:

cargo run --bin amaters-server --features "telemetry,io-uring" -- start --data-dir ./data

The Python SDK is fully async now, including pool stats. amaters.PoolStats (a #[pyclass], PyPoolStats under the hood) exposes .total_connections / .active_connections / .idle_connections / .max_connections, and both pool_stats() and close() are awaitable.

What’s New in 0.2.2

Sharding & Placement

Horizontal sharding activated: shard.rs + partitioner.rs made public — consistent-hashing / range / hash partitioning, QueryRouter, k-way ResultMerger, ShardRegistry, and ShardSplit / ShardMerge / ShardTransfer lifecycle.
Stateless PlacementCoordinator → deterministic PlacementPlan (hot-shard split, cold-adjacent merge, imbalance rebalance).
Background PlacementScheduler runs on the Raft leader and proposes PlacementActions as ClusterCommand log entries; attach via attach_placement_scheduler().
Typed ClusterCommand Raft log (7 variants: DataPut, DataDelete, PlaceSplit, PlaceMerge, PlaceTransfer, MembershipAdd, MembershipRemove), now postcard-encoded.
Chunked snapshot streaming via SnapshotStreamer / SnapshotReceiver for large state transfers.

Observability — OpenTelemetry

TelemetryConfig + TelemetryGuard (feature telemetry): OTLP gRPC exporter, batch SdkTracerProvider, flush-on-drop.
W3C TraceContext propagation for gRPC (TraceparentExtractor, inject_trace_context, TraceContextPropagatorLayer) for end-to-end cross-node traces.

Storage — io_uring WAL + index automation

UringWalWriter (feature io-uring, Linux): tokio_uring kernel-bypass I/O on a dedicated runtime; Send + Sync + Clone handle.
tokio-uring is now a cfg(target_os = "linux") conditional dep so macOS/Windows compile.
Secondary-index automation: IndexExtractor / IndexedField / IndexManager::apply_extracted, maintained transparently under update_lock with zero overhead when unattached.

Network — FHE CircuitCache

CircuitCache: thread-safe LRU (default 256 entries) keyed by blake3 hash of the predicate; FILTER + UPDATE sites call get_or_compile() to skip recompilation.

Security

Constant-time API-key validation via constant_time_eq — kills the timing side-channel oracle.
pyo3 0.28.3 → 0.29 fixes RUSTSEC-2026-0176 (OOB read on nth/nth_back for PyList/PyTuple iterators) and RUSTSEC-2026-0177 (missing Sync bound on PyCFunction::new_closure) in amaters-sdk-python.

Cluster — alert RuleEngine

RuleEngine evaluates AlertRules → AlertSeverity, dedups within a window, fans FiredAlerts out to AlertSinks (AlertSink trait + LogSink).

Server — migration framework

MigrationRegistry with BFS shortest-path MigrationPlan over registered Migration steps; MigrationContext over serde_json::Value.

Python SDK — async

Fully async pool_stats + close (removes the last block_on); amaters.PoolStats exported; 116 pytest cases across 5 files (10 Hypothesis property tests).

CLI — explain

explain <command> REPL debugger renders the local QueryPlanner logical + physical plan with cost estimates.

Testing

2,224 tests passed / 29 skipped / 0 failed; 10 chaos scenarios (cluster), 5 #[ignore] load tests (server), 15 proptest cases (5 LSM, 5 shard, 5 placement), 116 pytest.
CircuitCache benchmarks added to the net bench suite; 9 cluster Criterion bench groups.

Dep bumps

tokio 1.50 → 1.52, dashmap 6.1 → 6.2, rayon 1.11 → 1.12, serial_test 3.2 → 3.5, similar 3.1.0 → 3.1.1.

Fixed

KeyRange::midpoint() off-by-one for unequal-length keys — pad to max length, big-endian averaging with carry; now start ≤ mid < end.
Infinite loop in detect_rebalance() when n_shards < n_nodes — early exit + bound of n_shards + 1 iterations; placement proptests now run in under 1 ms.
Large-file refactors under the 2000-line policy: wal.rs, server.rs (1941→1195), tls.rs (1954→1183), health.rs (1973→1196), optimizer.rs (1977→1174); node_tests split.

Tips

Turn on cross-node tracing. Build with the telemetry feature and point the OTLP exporter at your collector — the TraceContextPropagatorLayer carries traceparent across gRPC hops so one trace spans the whole cluster.
On Linux, flip on io-uring. The io-uring feature routes the WAL through UringWalWriter’s kernel-bypass path; tune it via UringWalConfig (ring_size, batch_size, direct_io, channel_capacity).
Let hot shards split themselves. Attach the scheduler to your Raft node with attach_placement_scheduler(), and the leader will propose splits/merges/transfers as ClusterCommand entries automatically.
Inspect before you run. Use explain <cmd> in the REPL to see a query’s logical + physical plan and cost estimates locally — no server round-trip required.
Lean on the circuit cache. The FHE CircuitCache (default 256 entries) makes repeated identical predicates skip recompilation; it’s already wired into the FILTER and UPDATE sites.
Secondary indexes are opt-in and free when unused. Enable them with with_index_manager / register_index; with no manager attached there’s zero overhead, and once attached put / delete / atomic_update keep them consistent for you.
Stress it. The server load tests are #[ignore]d by default — run them with --include-ignored to drive the 1M put/get, 320k concurrent writer/reader, and mixed 80/20 scenarios.

This is the foundation

AmateRS sits inside the COOLJAPAN ecosystem (June 2026, full ecosystem), and 0.2.2 leans on it concretely: oxicode still handles serialization (no bincode in the default path), OxiARC provides LZ4 + DEFLATE compression, and postcard gives the new typed cluster log its compact, no_std-friendly encoding on the Raft replication path. The four mythic layers — Iwato (storage), Yata (FHE compute), Ukehi (consensus), Musubi (network) — now carry a sharding plane and an observability plane on top, all in Pure Rust.

Repository: https://github.com/cool-japan/amaters

Star the repo if you believe a database should be able to compute on your data without ever reading it. Sharding is live, traces are flowing, and the dark just got a lot more scalable.

— KitaSan at COOLJAPAN OÜ June 19, 2026