A pandas-class DataFrame for Rust — without the C extensions, the Cython, or the Python GIL that have shadowed data analysis in Python for over a decade.
Today we released PandRS 0.1.0 — the first stable release of a high-performance, Pure Rust DataFrame library with a pandas-class API, SIMD optimization, parallel processing, and an on-ramp to distributed computing.
No C. No Cython. No pandas/NumPy C-extensions. No Python GIL. The DataFrame world has been built for years on pandas and its native lineage — Cython-compiled hot loops, NumPy’s C core, and a global interpreter lock that quietly caps how far parallelism can go. That stack is powerful, but it is also why “just install pandas” can turn into wheel-and-compiler trouble, and why scaling out so often means working around the GIL. PandRS takes a different path: it is plain Rust, it compiles to a single static binary (or WASM), and cargo add pandrs needs no Python, no system libraries, and no build toolchain beyond the Rust compiler. PandRS reached this first stable release after a long alpha and beta gestation that began back in spring 2025 — this is the point where it became something we were ready to call solid.
Why PandRS 0.1.0 matters
If you have ever shipped a pandas pipeline, you know the friction: the GIL that keeps real parallelism just out of reach, C-extension wheels that fail to build on the one machine that matters, memory bloat on wide string-heavy frames, and no honest story for WASM or embedded targets. This first stable release matters because it removes that whole category of problem while keeping the API familiar — and the benchmarks are real.
Measured against pandas (Python) on an AMD Ryzen 9 5950X with 64GB RAM and NVMe storage:
- CSV Read (1M rows): 5.1x faster — a parallel reader that does not wait on the GIL.
- GroupBy Sum: 3.4x faster and Join: 4.1x faster — the everyday operations, vectorized.
- String Operations: 8.8x faster — where pandas tends to hurt the most.
- Rolling Window: 3.9x faster — window analytics without the per-row interpreter overhead.
- Memory: up to 89% reduction — via string pooling and categorical encoding.
These are the real numbers from this release. We would rather under-promise and let the benchmarks and the test suite speak: 1334+ tests pass with --all-targets --all-features, with zero clippy warnings under -D warnings.
Technical Deep Dive: how PandRS is built
PandRS is 175,000+ lines of Rust organized into a few honest layers.
1. Columnar core. The data model is built from Series, DataFrame, MultiIndex, and Categorical, backed by columnar storage. The columnar layout is what makes vectorized operations and string pooling possible, and it is the reason the 70+ pandas-compatible methods can target “100% Pandas API compatibility” for the core surface while running on a Rust foundation rather than a Python one.
2. Performance internals — SIMD + Rayon. Numeric kernels use automatic SIMD vectorization, and data-parallel work fans out across cores with Rayon. Columnar storage plus string pooling keeps memory tight, and lazy evaluation lets chains of operations fuse instead of materializing every intermediate frame. There is no GIL to step around, so parallelism is the default rather than the exception.
3. Modular helper layout. The method families live in focused helper modules so the codebase stays under control and easy to navigate: helpers/window_ops.rs (rolling and expanding windows, ewm), helpers/string_ops.rs (the str_* family), helpers/math_ops.rs, helpers/aggregations.rs (groupby aggregations), and helpers/comparison_ops.rs. Concretely that means window functions (rolling_mean/sum/var/median, expanding_*, ewm), groupby (groupby/agg/transform/groupby_apply), rich statistics (describe, corr/cov, geometric_mean, trimmed_mean), string ops (str_contains/str_replace/str_split and friends), and missing-data handling (fillna/ffill/bfill/dropna/isna).
4. I/O and optional feature surface. Out of the box PandRS reads and writes CSV (parallel reader/writer), Parquet (with compression), JSON (records and columnar), and Excel (XLSX/XLS), with SQL (PostgreSQL/MySQL/SQLite) behind the sql feature and zero-copy Arrow interop. Beyond that, optional features cover distributed (DataFusion), GPU (CUDA), JIT (Cranelift), visualization (text-based plus plotters), streaming, model serving, and WASM — each one opt-in, so the default build stays lean and Pure Rust.
Getting Started
cargo add pandrs
use pandrs::{DataFrame, Series};
fn main() -> pandrs::error::Result<()> {
let mut df = DataFrame::new();
df.add_column(
"name".to_string(),
Series::from_vec(vec!["Alice", "Bob", "Carol"], Some("name")),
)?;
df.add_column(
"age".to_string(),
Series::from_vec(vec![30, 25, 35], Some("age")),
)?;
df.add_column(
"salary".to_string(),
Series::from_vec(vec![75000.0, 65000.0, 85000.0], Some("salary")),
)?;
// Filter rows with a string predicate, then aggregate a column
let adults = df.filter("age > 25")?;
let mean_age = df.column("age")?.mean()?;
println!("{} adults, mean age {:.1}", adults.shape().0, mean_age);
Ok(())
}
What’s inside
- The first stable release of PandRS — a Pure Rust, pandas-class DataFrame library.
- 70+ pandas-compatible DataFrame methods, targeting 100% Pandas API compatibility for core methods.
- Core data structures:
Series,DataFrame,MultiIndex, andCategorical. - I/O across CSV (parallel reader/writer), Parquet (compression), JSON (records + columnar), Excel (XLSX/XLS), SQL (PostgreSQL/MySQL/SQLite via the
sqlfeature), and zero-copy Arrow. - Performance internals: automatic SIMD vectorization, Rayon parallel processing, columnar storage with string pooling, and lazy evaluation.
- Up to 89% memory reduction via string pooling plus categorical encoding.
- Benchmarked wins over pandas: CSV read 5.1x, GroupBy sum 3.4x, join 4.1x, string ops 8.8x, rolling window 3.9x.
- 1334+ tests passing with
--all-targets --all-features; zero clippy warnings under-D warnings. - 175,000+ lines of Rust on Rust 1.75+ (MSRV 1.70.0), across Linux, macOS, and Windows (x86_64 and ARM64).
- Optional features for those who need them:
distributed(DataFusion), GPU (CUDA), JIT (Cranelift), visualization, streaming, model serving, and WASM.
Tips
- The default build is 100% Pure Rust — there is no Python, no Cython, and no system library to configure.
cargo add pandrsis the whole setup. - Want the 89% memory win? Lean on string pooling and
Categoricalencoding for wide, string-heavy frames; that is where the savings live. - Reach for the window functions for time-series and analytics work —
rolling_mean,expanding_*, andewmare first-class and run vectorized. - For aggregation, prefer
groupby(...).agg(...)(andtransform/groupby_apply) over hand-rolled loops; the helper modules keep these on the parallel fast path. - Enable heavy optional features deliberately, not by default — opt into
distributed,jit, orcudaonly when your workload actually calls for them, and keep the everyday build lean. - Need SQL or Arrow? Turn on the
sqlfeature for PostgreSQL/MySQL/SQLite, and use the zero-copy Arrow interop to hand data to the rest of your stack without a copy.
This is the foundation
PandRS 0.1.0 ships as part of a brand-new Pure Rust scientific stack. It lands the very same day as NumRS2 (the NumPy layer) and OptiRS (the optimizer layer), one day after SciRS2 (0.1.0, 2025-12-29) laid down the foundation. Three sovereign layers arriving together — DataFrame, NumPy, and optimizer — alongside the SciRS2 base. PandRS is the DataFrame core of that stack: deeper integration across these projects is on the roadmap, but from day one it stands on its own as a fast, native DataFrame engine.
Repository: https://github.com/cool-japan/pandrs
Star the repo if a Pure Rust, pandas-class DataFrame — without the C, the Cython, or the GIL — is something you have been waiting for.
Pure Rust DataFrames are here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ December 30, 2025