sklears 0.1.2 Released — Preprocessing Completions, SciRS2 0.5.1, and AVX2 Quicksort

0.1.2 is where the last big category of stubs became real code.

Today we released sklears 0.1.2 — a feature-completion release that finalises preprocessing, hardens SIMD, upgrades the entire SciRS2 numerical backbone to 0.5.1, and ships a new categorical imputation suite and a full benchmarking regression-detection framework.

sklears is the pure-Rust alternative to scikit-learn: No C. No Fortran. No Cython. No GIL.
scikit-learn’s pipeline depends on NumPy, SciPy, and a zoo of compiled extensions that make packaging a perpetual headache; sklears is plain Rust on the SciRS2 stack, with OxiBLAS for BLAS/LAPACK and Oxicode for serialization. Compiles to a single static binary (or WASM) and runs everywhere.

Why sklears 0.1.2 is a step forward

The scikit-learn preprocessing module is one of the most-used parts of the library. The pain:

MinMaxScaler, MaxAbsScaler, and friends work, then you realise they’re wrappers around yet another C extension.
Imputers like KNNImputer and IterativeImputer are powerful but depend on NumPy’s C routines for inner loops.
OrdinalEncoder and TargetEncoder require careful Python packaging across platforms.

sklears 0.1.2 ends all of that. The headline wins:

12 preprocessing implementations moved from stub to real — five scalers (MinMaxScaler, MaxAbsScaler, UnitVectorScaler, FeatureWiseScaler, OutlierAwareScaler), five imputers (SimpleImputer, KNNImputer NaN-aware, IterativeImputer MICE ridge, MultipleImputer, GAINImputer), and two encoders (OrdinalEncoder, TargetEncoder with category smoothing) are now genuine Rust implementations.
SciRS2 0.4.2 → 0.5.1 across all 17 numerical crates — the biggest backend upgrade since 0.1.0.
AVX2 quicksort in sklears-simd — quicksort_avx2_impl with buffered partition and a precomputed compress LUT, plus 6 new hardening tests covering already-sorted, reverse-sorted, all-equal, heavy-duplicates, non-multiple-of-8, and large arrays.
12,242 tests passing across 36 crates — up from 11,586+ at 0.1.1.

Technical Deep Dive: what moved in 0.1.2

Preprocessing completions (sklears-preprocessing). Each scaler now owns its own Rust arithmetic — no C calls, no opaque extension module, just array iteration and SIMD-accelerated kernels where applicable. IterativeImputer implements MICE ridge (an iterative regression approach); GAINImputer uses a generative adversarial network imputation strategy. SIMD paths are re-enabled with real AVX kernels: simd_threshold_mask, simd_axpy, and simd_mahalanobis now route through simd_dot_product.
Categorical imputation suite (sklears-impute). Four new estimators: CategoricalClusteringImputer (k-means), CategoricalRandomForestImputer (MissForest/CART), AssociationRuleImputer (Apriori), and validate_imputer (K-fold MAE cross-validation). These fill the gap scikit-learn leaves for non-numeric missing data.
Benchmarking regression detection (sklears-compose). comprehensive_benchmarking ships a 15-trait regression-detection subsystem: AdaptiveThresholds, AlertSuppression, BaselineComparisons, BusinessImpactAssessment, EffectSizeAnalysis, PatternRecognition, RegressionAlertSystem, RegressionCache, RegressionDetector, RegressionDetectorConfig, RegressionMetadata, SeverityAssessment, SignificanceTesting, SmartSuppression, ThresholdManagement. Also: time_series_pipelines (LagFeatures, RollingWindow, Differencing, TemporalTrainTestSplit) and real CSR-based sparse column selection via scirs2-sparse.
Backend upgrades and migrations. sklears-svm fully migrated from nalgebra → scirs2-linalg; sklears-metrics fully migrated from sprs → scirs2-sparse with the sparse feature re-enabled. The workspace now carries oxicuda-backend, oxicuda-memory, oxicuda-blas, oxicuda-solver, and related v0.3 crates, replacing direct wgpu/cudarc/candle-core dependencies.

Getting Started

cargo add sklears

The example below uses the two areas 0.1.2 completed — the new preprocessing scalers and the new categorical imputer:

use sklears::prelude::*;
use sklears::preprocessing::{MinMaxScaler, IterativeImputer};
use sklears::impute::CategoricalClusteringImputer;

fn main() -> Result<()> {
    // MinMaxScaler is now a real Rust implementation — no C extension
    let scaler = MinMaxScaler::new().feature_range(0.0, 1.0);
    let dataset = sklears::dataset::make_classification(500, 8, 2, 42)?;
    let scaled = scaler.fit_transform(&dataset.data)?;

    // IterativeImputer (MICE ridge) for numeric missing values
    let imputer = IterativeImputer::new().max_iter(10);
    let imputed = imputer.fit_transform(&scaled)?;

    // CategoricalClusteringImputer for non-numeric columns
    let cat_imputer = CategoricalClusteringImputer::new().n_clusters(5);
    // cat_imputer.fit_transform(&cat_data)?;

    println!("Preprocessed shape: {:?}", imputed.shape());
    Ok(())
}

AVX2 quicksort is available directly from sklears-simd:

use sklears_simd::sort::quicksort_avx2;

let mut data: Vec<f32> = vec![3.1, 1.4, 1.5, 9.2, 6.5, 3.5, 8.9, 7.9];
quicksort_avx2(&mut data);
assert!(data.windows(2).all(|w| w[0] <= w[1]));

What’s New in 0.1.2

Preprocessing: 12 stub-to-real promotions across scalers, imputers, and encoders; SIMD AVX kernels re-enabled.
sklears-impute: 4 new categorical imputers (CategoricalClusteringImputer, CategoricalRandomForestImputer, AssociationRuleImputer) plus validate_imputer K-fold MAE cross-validation.
sklears-simd: AVX2 quicksort with buffered partition and 6 hardening tests; 5 prior test failures fixed (MAE gradient sign bug, cross-product SSE2 shuffles, F32x4 stride test, AVX2 compress partition).
sklears-compose: comprehensive_benchmarking module (15-trait regression-detection subsystem), time_series_pipelines, enhanced WASM integration, sparse CSR column selection.
SciRS2 0.4.2 → 0.5.1 across all 17 numerical crates; oxicode 0.2 → 0.2.4; oxifft 0.3.0 → 0.3.2.
OxiCUDA v0.3 family added to workspace (oxicuda-backend, oxicuda-memory, oxicuda-blas, oxicuda-solver, oxicuda-manifold, oxicuda-dnn, oxicuda-driver, oxicuda-ptx, oxicuda-primitives).
sklears-svm: Full migration from nalgebra → scirs2-linalg; SVC conformal prediction restructured to Option<SVC<Trained>> with honest NotTrained error.
sklears-metrics: Full migration from sprs → scirs2-sparse; sparse feature re-enabled.
sklears-gaussian-process: Cholesky stability for indefinite saddle-point systems fixed (SPD via regularized Cholesky, indefinite via LU); 5 previously-ignored kriging tests re-enabled.
sklears-core: system_info module with SystemMemory reading real OS stats; DSL macros fully wired; trait_explorer GPU-context init honest CPU fallback.
Doctest fixes: sklears-covariance, sklears-cross-decomposition, sklears-isotonic, sklears-model-selection, sklears-neighbors, sklears-semi-supervised; flaky timing tests fixed across several crates.

Tips

Use IterativeImputer for tabular data with MAR missingness. It implements MICE ridge — set .max_iter(10) to start and increase if convergence is slow. It is now a real Rust implementation, not a stub.
Pair KNNImputer with MinMaxScaler first. KNN distance is sensitive to scale; run MinMaxScaler (now real, not a C wrapper) before KNNImputer to keep neighbours meaningful.
Use CategoricalClusteringImputer for mixed-type data. When numeric imputers don’t apply (string categories, ordinal labels), the new categorical imputers fill that gap without reaching for Python.
Enable AVX2 for sort-heavy workloads. quicksort_avx2 in sklears-simd is validated against already-sorted, reverse-sorted, all-equal, and heavy-duplicate inputs — safe to use on production data without edge-case surprises.
Pin to SciRS2 0.5.1. 0.1.2 validates against the full scirs2-* 0.5.1 line. Mismatching SciRS2 versions across workspace members can surface subtle numerical differences; keep them in lockstep.
OxiCUDA v0.3 is in the workspace but GPU paths remain behind feature flags. The gpu_support feature provides CPU fallbacks and honest errors until GPU kernels land — enable it to wire up the OxiCUDA plumbing without committing to a GPU runtime today.

This is the foundation

As of June 30, 2026, sklears is the classical-ML layer of the COOLJAPAN stack, sitting on:

SciRS2 0.5.1 — the numerical backbone (linear algebra, stats, optimization, signal, FFT, sparse)
OxiBLAS — BLAS/LAPACK, pure Rust
Oxicode — serialization, pure Rust (replacing bincode)
OxiFFT 0.3.2 — FFT, pure Rust (replacing rustfft)
OxiARC — compression/archiving, pure Rust
OxiCUDA v0.3 — GPU abstraction layer, replacing wgpu/cudarc
NumRS2 / PandRS — array and dataframe substrate

Beyond the core, sklears feeds into TenfloweRS and TrustformeRS for deep learning, and alongside Celers for streaming data — one pure-Rust stack from raw data through trained models, no Python runtime required.

Repository: https://github.com/cool-japan/sklears

Star the repo if a production-ready, no-GIL, no-C-extension scikit-learn is something you want to build on — every star helps the ecosystem grow. Thanks for reading, and happy preprocessing.

— KitaSan at COOLJAPAN OÜ
June 30, 2026