sklears 0.1.1 Released — Correctness Fixes for HDBSCAN, Streaming, and Pipelines

Stable means stable — so the first thing after 0.1.0 is making the edges as trustworthy as the core.

Today we released sklears 0.1.1 — a correctness and stability patch that hardens clustering, streaming preprocessing, pipelines, and serialization across the pure-Rust scikit-learn surface.

sklears is the pure-Rust alternative to scikit-learn, and the sovereignty story is unchanged: No C. No Fortran. No Cython. No GIL. scikit-learn leans on a tower of compiled extensions; sklears is plain Rust on top of the SciRS2 stack, with OxiBLAS for BLAS/LAPACK and Oxicode for serialization. This release does not move that line — it polishes what already sits behind it.

Why 0.1.1 matters

0.1.0 was a big release: 36 crates, type-safe Untrained → Trained state machines, builder APIs, SIMD via std::simd, Rayon work-stealing parallelism, and >99% scikit-learn API coverage end-to-end against scikit-learn’s ~v1.5 feature set. Shipping that much surface in one stable cut means the next job is the hardening pass — finding the corners where behavior was subtly wrong and nailing them down. That is exactly what 0.1.1 is, and the test suite that guards it still stands at 11,586+ tests passing across 36 crates with coverage held at >99%.

Here is what got fixed, and why each one matters:

HDBSCAN cluster persistence. Root-node detection and the order in which cluster-persistence values propagated were corrected. Persistence is what HDBSCAN uses to decide which clusters survive condensation, so getting the root and the propagation order right is the difference between trustworthy density clustering and quietly wrong labels.
Streaming Default drift. StreamingStandardScaler and StreamingSimpleImputer had hand-written Default impls; they now use #[derive(Default)]. Manual defaults drift from the field defaults over time — deriving them removes a whole class of “the constructed value didn’t match what I declared” bugs.
Pipeline mutable access. Pipeline::get_step_mut had lifetime elision that didn’t borrow-check cleanly for dyn PipelineStep. Fixing the elision makes mutable access to a fitted step ergonomic instead of a fight with the borrow checker.
Deterministic spectral graph clustering. SpectralGraphConfig was missing a random_seed field that its tests assumed; adding it makes spectral graph clustering reproducible run to run.
Serialization and GPU-accel field fixes. Arrow StringArray collection from an Option<&str> iterator was corrected, and a struct field-name mismatch in hardware_acceleration.rs was fixed so the GpuAcceleration path compiles against the right names.

None of this adds an algorithm. All of it makes the algorithms you already had behave the way the docs promised.

What’s New in 0.1.1

In plain language, this is a bug-fix release plus the version bump — no new estimators:

Clustering: HDBSCAN persistence extraction now detects the root correctly and propagates persistence in the right order.
Streaming preprocessing: StreamingStandardScaler and StreamingSimpleImputer use derived Default; StreamingSimpleImputer uses the ? operator for Option early-return.
Pipelines: get_step_mut borrow-checks correctly for dyn PipelineStep.
Graph clustering: SpectralGraphConfig gained random_seed for deterministic results.
Serialization: Arrow StringArray collection from an Option<&str> iterator is fixed.
GPU acceleration: field-name mismatch in hardware_acceleration.rs resolved.
Dependencies: SciRS2 crates bumped to 0.4.2 (with oxicode 0.2 and oxifft 0.3.0 underneath).

Getting Started

Install:

cargo add sklears

The example below exercises the two areas 0.1.1 hardened — density clustering (the HDBSCAN/persistence fix) and the pipeline path:

use sklears::prelude::*;
use sklears::cluster::HDBSCAN;

fn main() -> Result<()> {
    // Load a small dataset
    let dataset = sklears::dataset::make_blobs(300, 2, 3, 0.6)?;

    // 0.1.1 fixes HDBSCAN cluster-persistence extraction (root detection + ordering)
    let labels = HDBSCAN::new()
        .min_cluster_size(10)
        .fit_predict(&dataset.data)?;

    let n_clusters = labels.iter().filter(|&&l| l >= 0).max().map_or(0, |m| m + 1);
    println!("found {} clusters", n_clusters);
    Ok(())
}

The same release also smooths the preprocessing-into-model flow — a Pipeline chaining a StandardScaler into a LinearRegression, with get_step_mut now available to reach back into a fitted step:

use sklears::prelude::*;
use sklears::pipeline::Pipeline;

let mut pipe = Pipeline::new()
    .add_step("scale", StandardScaler::new())
    .add_step("model", LinearRegression::new());

pipe.fit(&x, &y)?;

// 0.1.1 fixes get_step_mut lifetime elision for `dyn PipelineStep`
if let Some(step) = pipe.get_step_mut("scale") {
    // mutate the fitted step in place
}

Tips

Trust density clustering again. With persistence extraction fixed, HDBSCAN::new().min_cluster_size(...).fit_predict(...) is the go-to for data where you do not know the cluster count up front. Tune min_cluster_size to control how aggressively small groups are merged into noise.
Make graph clustering reproducible. Set random_seed on SpectralGraphConfig (and other spectral/graph clustering configs) so reruns and CI produce identical labels.
Go out-of-core with streaming estimators. Use StreamingStandardScaler and StreamingSimpleImputer when the data does not fit in memory — now that their defaults are derived, the constructed state matches what you declared.
Reach into fitted pipelines. Pipeline::get_step_mut("name") lets you adjust a step after fit without rebuilding the whole pipeline.
Pin to the SciRS2 0.4.2 line. 0.1.1 tracks scirs2-* 0.4.2 (oxicode 0.2, oxifft 0.3.0); pin to it so your numerical backbone matches what these tests were validated against.
Enable only the feature flags you need. Keep build times and binary size down by turning on just the estimator and backend features your project actually uses.

A maturing foundation

As of 2026-04-27, sklears sits squarely inside the COOLJAPAN ecosystem: built on SciRS2 0.4.2 with OxiBLAS for linear algebra and Oxicode for serialization, drawing on NumRS2 and PandRS for array and dataframe work. sklears covers classical machine learning, alongside TenfloweRS and TrustformeRS for deep learning — one pure-Rust stack from data wrangling through models. 0.1.1 is the kind of quiet release that makes the rest of that stack worth standing on.

Repository: https://github.com/cool-japan/sklears

Star the repo if a pure-Rust, no-GIL scikit-learn is something you want to build on — every star helps the ecosystem grow. Thanks for reading, and happy clustering.

— KitaSan at COOLJAPAN OÜ April 27, 2026