COOLJAPAN
← All posts

sklears 0.1.2 Released — Preprocessing Completions, SciRS2 0.5.1, and AVX2 Quicksort

sklears 0.1.2 brings 12 real preprocessing implementations (no more stubs), SciRS2 0.5.1 upgrade, AVX2 quicksort in sklears-simd, advanced categorical imputers, and a full benchmarking regression-detection subsystem — 12,242 tests across 36 crates, >99% scikit-learn API coverage held.

release sklears machine-learning scikit-learn rust classical-ml preprocessing simd scirs2

0.1.2 is where the last big category of stubs became real code.

Today we released sklears 0.1.2 — a feature-completion release that finalises preprocessing, hardens SIMD, upgrades the entire SciRS2 numerical backbone to 0.5.1, and ships a new categorical imputation suite and a full benchmarking regression-detection framework.

sklears is the pure-Rust alternative to scikit-learn: No C. No Fortran. No Cython. No GIL.
scikit-learn’s pipeline depends on NumPy, SciPy, and a zoo of compiled extensions that make packaging a perpetual headache; sklears is plain Rust on the SciRS2 stack, with OxiBLAS for BLAS/LAPACK and Oxicode for serialization. Compiles to a single static binary (or WASM) and runs everywhere.

Why sklears 0.1.2 is a step forward

The scikit-learn preprocessing module is one of the most-used parts of the library. The pain:

sklears 0.1.2 ends all of that. The headline wins:

Technical Deep Dive: what moved in 0.1.2

  1. Preprocessing completions (sklears-preprocessing). Each scaler now owns its own Rust arithmetic — no C calls, no opaque extension module, just array iteration and SIMD-accelerated kernels where applicable. IterativeImputer implements MICE ridge (an iterative regression approach); GAINImputer uses a generative adversarial network imputation strategy. SIMD paths are re-enabled with real AVX kernels: simd_threshold_mask, simd_axpy, and simd_mahalanobis now route through simd_dot_product.

  2. Categorical imputation suite (sklears-impute). Four new estimators: CategoricalClusteringImputer (k-means), CategoricalRandomForestImputer (MissForest/CART), AssociationRuleImputer (Apriori), and validate_imputer (K-fold MAE cross-validation). These fill the gap scikit-learn leaves for non-numeric missing data.

  3. Benchmarking regression detection (sklears-compose). comprehensive_benchmarking ships a 15-trait regression-detection subsystem: AdaptiveThresholds, AlertSuppression, BaselineComparisons, BusinessImpactAssessment, EffectSizeAnalysis, PatternRecognition, RegressionAlertSystem, RegressionCache, RegressionDetector, RegressionDetectorConfig, RegressionMetadata, SeverityAssessment, SignificanceTesting, SmartSuppression, ThresholdManagement. Also: time_series_pipelines (LagFeatures, RollingWindow, Differencing, TemporalTrainTestSplit) and real CSR-based sparse column selection via scirs2-sparse.

  4. Backend upgrades and migrations. sklears-svm fully migrated from nalgebrascirs2-linalg; sklears-metrics fully migrated from sprsscirs2-sparse with the sparse feature re-enabled. The workspace now carries oxicuda-backend, oxicuda-memory, oxicuda-blas, oxicuda-solver, and related v0.3 crates, replacing direct wgpu/cudarc/candle-core dependencies.

Getting Started

cargo add sklears

The example below uses the two areas 0.1.2 completed — the new preprocessing scalers and the new categorical imputer:

use sklears::prelude::*;
use sklears::preprocessing::{MinMaxScaler, IterativeImputer};
use sklears::impute::CategoricalClusteringImputer;

fn main() -> Result<()> {
    // MinMaxScaler is now a real Rust implementation — no C extension
    let scaler = MinMaxScaler::new().feature_range(0.0, 1.0);
    let dataset = sklears::dataset::make_classification(500, 8, 2, 42)?;
    let scaled = scaler.fit_transform(&dataset.data)?;

    // IterativeImputer (MICE ridge) for numeric missing values
    let imputer = IterativeImputer::new().max_iter(10);
    let imputed = imputer.fit_transform(&scaled)?;

    // CategoricalClusteringImputer for non-numeric columns
    let cat_imputer = CategoricalClusteringImputer::new().n_clusters(5);
    // cat_imputer.fit_transform(&cat_data)?;

    println!("Preprocessed shape: {:?}", imputed.shape());
    Ok(())
}

AVX2 quicksort is available directly from sklears-simd:

use sklears_simd::sort::quicksort_avx2;

let mut data: Vec<f32> = vec![3.1, 1.4, 1.5, 9.2, 6.5, 3.5, 8.9, 7.9];
quicksort_avx2(&mut data);
assert!(data.windows(2).all(|w| w[0] <= w[1]));

What’s New in 0.1.2

Tips

This is the foundation

As of June 30, 2026, sklears is the classical-ML layer of the COOLJAPAN stack, sitting on:

Beyond the core, sklears feeds into TenfloweRS and TrustformeRS for deep learning, and alongside Celers for streaming data — one pure-Rust stack from raw data through trained models, no Python runtime required.

Repository: https://github.com/cool-japan/sklears

Star the repo if a production-ready, no-GIL, no-C-extension scikit-learn is something you want to build on — every star helps the ecosystem grow. Thanks for reading, and happy preprocessing.

KitaSan at COOLJAPAN OÜ
June 30, 2026

↑ Back to all posts