Classical machine learning, finally without the Python runtime, the GIL, or a single line of C or Fortran.
Today we released sklears 0.1.0 — a comprehensive, pure-Rust classical machine learning library inspired by scikit-learn’s intuitive API, rebuilt with Rust’s performance and safety on top of the SciRS2 numerical stack.
No C. No Fortran. No Cython. No GIL. scikit-learn is brilliant, but it rides on a deep stack of compiled glue — NumPy and SciPy over BLAS/LAPACK, Cython extensions, and a Python interpreter to coordinate it all. That stack is what makes pip install occasionally explode, what forces you to ship an interpreter to production, and what serializes your CPUs behind the GIL. sklears throws the whole thing out. It is built on OxiBLAS for BLAS/LAPACK, Oxicode for SIMD-optimized serialization, and the SciRS2 ecosystem for scientific computing — with ZERO C or Fortran system dependencies. The result compiles to a single static binary (or WASM)…
Why sklears 0.1.0 matters
scikit-learn’s pain points are structural, not incidental. You pay for a Python runtime everywhere you deploy. The GIL caps true parallelism unless a library drops into C. The C/Cython build chain turns “just install it” into a yak-shave on fresh machines and exotic targets. And because everything is dynamically typed, a wrong array shape or a forgotten .fit() surfaces as a runtime traceback — often in production, not in CI.
sklears trades all of that for compile-time guarantees and a single binary:
- >99% of scikit-learn’s API surface, organized across 36 focused crates, targeting end-to-end parity with roughly scikit-learn’s v1.5 feature set.
- Compile-time model-state safety: the
Untrained → Trainedtype-state machine means you literally cannot call.predict()on an untrained model — it will not compile. - No-GIL parallelism via Rayon’s work-stealing scheduler — set
.n_jobs(-1)and your trees actually build in parallel across cores. - SIMD acceleration through
std::simd, so hot numeric loops vectorize without hand-written intrinsics. - Single-binary deployment with no Python interpreter to install, package, or version-pin in production.
- A validated test base: 11,222 tests passing (100%), with 175 skipped.
Technical Deep Dive: a 36-crate workspace on SciRS2
sklears is a Cargo workspace of 36 focused members, each owning one slice of the scikit-learn surface. You depend on the sklears meta-crate for the full experience, or pull in only the families you need.
- Models —
sklears-linear(LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression with L-BFGS/SAG/SAGA solvers, BayesianRidge, ARDRegression, the Gamma/Poisson/Tweedie GLMs, LinearSVC/LinearSVR),sklears-tree(DecisionTree, RandomForest, ExtraTrees),sklears-ensemble(Voting, Stacking, AdaBoost, GradientBoosting), andsklears-svm(SVC, SVR with multiple kernels). - Clustering & neighbors —
sklears-clustering(KMeans, DBSCAN, Hierarchical, MeanShift, SpectralClustering) andsklears-neighborsfor nearest-neighbor methods. - Decomposition & manifold —
sklears-decomposition(PCA, IncrementalPCA, KernelPCA, ICA, NMF, FactorAnalysis), plussklears-manifold,sklears-cross-decomposition, andsklears-mixturefor Gaussian Mixture Models. - Preprocessing & data —
sklears-preprocessing(scalers, encoders, transformers, imputers),sklears-impute,sklears-feature-selection,sklears-feature-extraction, andsklears-datasetswith memory-mapped dataset support and CSV/Parquet loaders. - Model selection & metrics —
sklears-model-selection(cross-validation, GridSearchCV, RandomizedSearchCV, BayesSearchCV) andsklears-metrics. - And the long tail —
sklears-naive-bayes,sklears-gaussian-process,sklears-discriminant-analysis,sklears-semi-supervised,sklears-covariance,sklears-isotonic,sklears-kernel-approximation,sklears-calibration,sklears-multiclass,sklears-multioutput,sklears-compose,sklears-inspection,sklears-dummy,sklears-neural(MLP, RBM, autoencoders via SciRS2 autograd), plussklears-simd,sklears-utils, andsklears-core.
Threading it all together is the type-state machine in sklears-core: estimators move from Untrained to Trained as a type-level transition, so the trained-only methods simply do not exist on an untrained value. Underneath, every crate stands on the same foundation — SciRS2 (scirs2-core/-linalg/-stats/-cluster/-metrics/-optimize/-datasets/-sparse at 0.3.x) for the numerics, OxiBLAS for BLAS/LAPACK, and Oxicode for fast serialization. When you need to call from Python, the sklears-python crate exposes PyO3 bindings — Rust speed, familiar scikit-learn ergonomics.
Getting Started
Install:
cargo add sklears
A minimal end-to-end example — generate data, split, train, and score:
use sklears::prelude::*;
use sklears::linear_model::LinearRegression;
use sklears::model_selection::train_test_split;
fn main() -> Result<()> {
// Generate a regression dataset
let dataset = sklears::dataset::make_regression(100, 10, 0.1)?;
// Split into train/test sets (20% test, seed 42)
let (x_train, x_test, y_train, y_test) =
train_test_split(&dataset.data, &dataset.target, 0.2, Some(42))?;
// Create and train the model (Untrained -> Trained at compile time)
let model = LinearRegression::new()
.fit_intercept(true)
.fit(&x_train, &y_train)?;
// Predict and evaluate
let _predictions = model.predict(&x_test)?;
let r2 = model.score(&x_test, &y_test)?;
println!("R² score: {:.4}", r2);
Ok(())
}
The type-state machine turns a whole class of bugs into compile errors:
let model = LinearRegression::new();
// model.predict(&x_test)?; // ❌ won't compile: predict() doesn't exist on an Untrained model
And the same builder ergonomics scale to parallel ensembles:
use sklears::ensemble::RandomForestClassifier;
let forest = RandomForestClassifier::new()
.n_estimators(200)
.n_jobs(-1) // use all cores via Rayon
.fit(&x_train, &y_train)?;
What’s inside
- 36 crates, >99% scikit-learn API coverage — linear models, trees, SVM, ensembles, clustering, decomposition, manifold learning, Gaussian processes, naive Bayes, nearest neighbors, discriminant analysis, mixtures, and more.
- Type-safe state machines that catch “predict before fit” at compile time.
- Builder pattern for clear, discoverable estimator configuration.
- SIMD + Rayon for vectorized math and lock-free, work-stealing parallelism — no GIL in sight.
- Python bindings via the
sklears-pythonPyO3 crate. - Flexible data loading with memory-mapped datasets and CSV/Parquet loaders.
- AutoML & hyperparameter search — GridSearchCV, RandomizedSearchCV, BayesSearchCV.
- A Criterion benchmarking suite so performance claims stay honest across releases.
Tips
-
Lean on the builder pattern. Chain
.fit_intercept(true),.n_estimators(...), and friends to configure estimators readably before the final.fit(...). -
Trust the type-state. The
Untrained → Trainedtransition means a forgotten.fit()is a compiler error, not a 2 a.m. production traceback — let it do the work. -
Parallelize for free. Pass
.n_jobs(-1)on tree and forest estimators to fan out across all cores via Rayon. -
Compile only what you use. Enable just the families you need to keep build times and binaries lean:
[dependencies] sklears = { version = "0.1.0", features = ["linear", "clustering", "parallel"] } -
Combine
train_test_splitwithGridSearchCVfor a clean model-selection loop, then report.score(...)on the held-out set. -
Reach for
sklears-pythonwhen an existing Python codebase needs sklears speed without a full rewrite.
This is the foundation
sklears does not stand alone. It is built directly on SciRS2, the numerical backbone of the COOLJAPAN ecosystem, with NumRS2 and PandRS (DataFrames) feeding it data, OxiBLAS powering the linear algebra, and Oxicode handling serialization. And it has a sibling shipping the very same day: where sklears covers classical machine learning, TenfloweRS covers deep learning — two halves of a pure-Rust ML stack arriving together.
Repository: https://github.com/cool-japan/sklears
Star the repo if you want classical ML you can deploy as a single binary, without a Python runtime or a GIL in the way. Pure Rust classical machine learning is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ March 20, 2026