0.1.2 is where the last big category of stubs became real code.
Today we released sklears 0.1.2 — a feature-completion release that finalises preprocessing, hardens SIMD, upgrades the entire SciRS2 numerical backbone to 0.5.1, and ships a new categorical imputation suite and a full benchmarking regression-detection framework.
sklears is the pure-Rust alternative to scikit-learn: No C. No Fortran. No Cython. No GIL.
scikit-learn’s pipeline depends on NumPy, SciPy, and a zoo of compiled extensions that make packaging a perpetual headache; sklears is plain Rust on the SciRS2 stack, with OxiBLAS for BLAS/LAPACK and Oxicode for serialization. Compiles to a single static binary (or WASM) and runs everywhere.
Why sklears 0.1.2 is a step forward
The scikit-learn preprocessing module is one of the most-used parts of the library. The pain:
MinMaxScaler,MaxAbsScaler, and friends work, then you realise they’re wrappers around yet another C extension.- Imputers like
KNNImputerandIterativeImputerare powerful but depend on NumPy’s C routines for inner loops. OrdinalEncoderandTargetEncoderrequire careful Python packaging across platforms.
sklears 0.1.2 ends all of that. The headline wins:
- 12 preprocessing implementations moved from stub to real — five scalers (
MinMaxScaler,MaxAbsScaler,UnitVectorScaler,FeatureWiseScaler,OutlierAwareScaler), five imputers (SimpleImputer,KNNImputerNaN-aware,IterativeImputerMICE ridge,MultipleImputer,GAINImputer), and two encoders (OrdinalEncoder,TargetEncoderwith category smoothing) are now genuine Rust implementations. - SciRS2 0.4.2 → 0.5.1 across all 17 numerical crates — the biggest backend upgrade since 0.1.0.
- AVX2 quicksort in
sklears-simd—quicksort_avx2_implwith buffered partition and a precomputed compress LUT, plus 6 new hardening tests covering already-sorted, reverse-sorted, all-equal, heavy-duplicates, non-multiple-of-8, and large arrays. - 12,242 tests passing across 36 crates — up from 11,586+ at 0.1.1.
Technical Deep Dive: what moved in 0.1.2
-
Preprocessing completions (
sklears-preprocessing). Each scaler now owns its own Rust arithmetic — no C calls, no opaque extension module, just array iteration and SIMD-accelerated kernels where applicable.IterativeImputerimplements MICE ridge (an iterative regression approach);GAINImputeruses a generative adversarial network imputation strategy. SIMD paths are re-enabled with real AVX kernels:simd_threshold_mask,simd_axpy, andsimd_mahalanobisnow route throughsimd_dot_product. -
Categorical imputation suite (
sklears-impute). Four new estimators:CategoricalClusteringImputer(k-means),CategoricalRandomForestImputer(MissForest/CART),AssociationRuleImputer(Apriori), andvalidate_imputer(K-fold MAE cross-validation). These fill the gap scikit-learn leaves for non-numeric missing data. -
Benchmarking regression detection (
sklears-compose).comprehensive_benchmarkingships a 15-trait regression-detection subsystem:AdaptiveThresholds,AlertSuppression,BaselineComparisons,BusinessImpactAssessment,EffectSizeAnalysis,PatternRecognition,RegressionAlertSystem,RegressionCache,RegressionDetector,RegressionDetectorConfig,RegressionMetadata,SeverityAssessment,SignificanceTesting,SmartSuppression,ThresholdManagement. Also:time_series_pipelines(LagFeatures,RollingWindow,Differencing,TemporalTrainTestSplit) and real CSR-based sparse column selection viascirs2-sparse. -
Backend upgrades and migrations.
sklears-svmfully migrated fromnalgebra→scirs2-linalg;sklears-metricsfully migrated fromsprs→scirs2-sparsewith thesparsefeature re-enabled. The workspace now carriesoxicuda-backend,oxicuda-memory,oxicuda-blas,oxicuda-solver, and related v0.3 crates, replacing directwgpu/cudarc/candle-coredependencies.
Getting Started
cargo add sklears
The example below uses the two areas 0.1.2 completed — the new preprocessing scalers and the new categorical imputer:
use sklears::prelude::*;
use sklears::preprocessing::{MinMaxScaler, IterativeImputer};
use sklears::impute::CategoricalClusteringImputer;
fn main() -> Result<()> {
// MinMaxScaler is now a real Rust implementation — no C extension
let scaler = MinMaxScaler::new().feature_range(0.0, 1.0);
let dataset = sklears::dataset::make_classification(500, 8, 2, 42)?;
let scaled = scaler.fit_transform(&dataset.data)?;
// IterativeImputer (MICE ridge) for numeric missing values
let imputer = IterativeImputer::new().max_iter(10);
let imputed = imputer.fit_transform(&scaled)?;
// CategoricalClusteringImputer for non-numeric columns
let cat_imputer = CategoricalClusteringImputer::new().n_clusters(5);
// cat_imputer.fit_transform(&cat_data)?;
println!("Preprocessed shape: {:?}", imputed.shape());
Ok(())
}
AVX2 quicksort is available directly from sklears-simd:
use sklears_simd::sort::quicksort_avx2;
let mut data: Vec<f32> = vec![3.1, 1.4, 1.5, 9.2, 6.5, 3.5, 8.9, 7.9];
quicksort_avx2(&mut data);
assert!(data.windows(2).all(|w| w[0] <= w[1]));
What’s New in 0.1.2
- Preprocessing: 12 stub-to-real promotions across scalers, imputers, and encoders; SIMD AVX kernels re-enabled.
- sklears-impute: 4 new categorical imputers (
CategoricalClusteringImputer,CategoricalRandomForestImputer,AssociationRuleImputer) plusvalidate_imputerK-fold MAE cross-validation. - sklears-simd: AVX2 quicksort with buffered partition and 6 hardening tests; 5 prior test failures fixed (MAE gradient sign bug, cross-product SSE2 shuffles, F32x4 stride test, AVX2 compress partition).
- sklears-compose:
comprehensive_benchmarkingmodule (15-trait regression-detection subsystem),time_series_pipelines, enhanced WASM integration, sparse CSR column selection. - SciRS2 0.4.2 → 0.5.1 across all 17 numerical crates;
oxicode0.2 → 0.2.4;oxifft0.3.0 → 0.3.2. - OxiCUDA v0.3 family added to workspace (
oxicuda-backend,oxicuda-memory,oxicuda-blas,oxicuda-solver,oxicuda-manifold,oxicuda-dnn,oxicuda-driver,oxicuda-ptx,oxicuda-primitives). - sklears-svm: Full migration from
nalgebra→scirs2-linalg;SVCconformal prediction restructured toOption<SVC<Trained>>with honestNotTrainederror. - sklears-metrics: Full migration from
sprs→scirs2-sparse;sparsefeature re-enabled. - sklears-gaussian-process: Cholesky stability for indefinite saddle-point systems fixed (SPD via regularized Cholesky, indefinite via LU); 5 previously-ignored kriging tests re-enabled.
- sklears-core:
system_infomodule withSystemMemoryreading real OS stats; DSL macros fully wired;trait_explorerGPU-context init honest CPU fallback. - Doctest fixes:
sklears-covariance,sklears-cross-decomposition,sklears-isotonic,sklears-model-selection,sklears-neighbors,sklears-semi-supervised; flaky timing tests fixed across several crates.
Tips
- Use
IterativeImputerfor tabular data with MAR missingness. It implements MICE ridge — set.max_iter(10)to start and increase if convergence is slow. It is now a real Rust implementation, not a stub. - Pair
KNNImputerwithMinMaxScalerfirst. KNN distance is sensitive to scale; runMinMaxScaler(now real, not a C wrapper) beforeKNNImputerto keep neighbours meaningful. - Use
CategoricalClusteringImputerfor mixed-type data. When numeric imputers don’t apply (string categories, ordinal labels), the new categorical imputers fill that gap without reaching for Python. - Enable AVX2 for sort-heavy workloads.
quicksort_avx2insklears-simdis validated against already-sorted, reverse-sorted, all-equal, and heavy-duplicate inputs — safe to use on production data without edge-case surprises. - Pin to SciRS2 0.5.1. 0.1.2 validates against the full
scirs2-*0.5.1 line. Mismatching SciRS2 versions across workspace members can surface subtle numerical differences; keep them in lockstep. - OxiCUDA v0.3 is in the workspace but GPU paths remain behind feature flags. The
gpu_supportfeature provides CPU fallbacks and honest errors until GPU kernels land — enable it to wire up the OxiCUDA plumbing without committing to a GPU runtime today.
This is the foundation
As of June 30, 2026, sklears is the classical-ML layer of the COOLJAPAN stack, sitting on:
- SciRS2 0.5.1 — the numerical backbone (linear algebra, stats, optimization, signal, FFT, sparse)
- OxiBLAS — BLAS/LAPACK, pure Rust
- Oxicode — serialization, pure Rust (replacing bincode)
- OxiFFT 0.3.2 — FFT, pure Rust (replacing rustfft)
- OxiARC — compression/archiving, pure Rust
- OxiCUDA v0.3 — GPU abstraction layer, replacing wgpu/cudarc
- NumRS2 / PandRS — array and dataframe substrate
Beyond the core, sklears feeds into TenfloweRS and TrustformeRS for deep learning, and alongside Celers for streaming data — one pure-Rust stack from raw data through trained models, no Python runtime required.
Repository: https://github.com/cool-japan/sklears
Star the repo if a production-ready, no-GIL, no-C-extension scikit-learn is something you want to build on — every star helps the ecosystem grow. Thanks for reading, and happy preprocessing.
— KitaSan at COOLJAPAN OÜ
June 30, 2026