The DataFrame layer of the COOLJAPAN scientific stack just went a lot more Pure Rust.
Today we released PandRS 0.3.1 — a patch release that rips the hidden C-dependency chain out of DataFrame Excel and compression I/O and replaces it with an in-tree, OxiARC-backed xlsx engine.
No C. No Cython. No bundled zlib/xz/zstd C libs. No Python GIL.
Just a pandas-class DataFrame API — SIMD, parallel, and distributed — that compiles to a single static binary (or WASM) and runs everywhere, from laptops to edge devices to cloud clusters.
Why PandRS 0.3.1 matters
DataFrame Excel and compression stacks have a quiet C-dependency creep problem. Read an .xlsx file and you usually drag in zip, which drags in flate2, which drags in miniz_oxide or worse, a system zlib. Read a Parquet file and you can quietly pull zstd-sys, lz4-sys, and liblzma-sys (C libxz) through DataFusion and Arrow defaults. None of that shows up in your code — it shows up in your build, your supply chain, and your cross-compilation pain.
PandRS 0.3.1 closes those holes:
- Pure Rust Excel. The
excelfeature no longer pullszip,flate2, orminiz_oxide. xlsx reading and writing now run on an in-tree engine built onoxiarc-archive(Pure Rust ZIP) +quick-xml. - The
dirs-syscrate is gone. Config-directory resolution is now an inlinestd::env-based implementation —dirsanddirs-sysremoved. - liblzma, zstd-sys, and lz4-sys are pinned out of the default and most feature builds by switching
datafusion,parquet, andarrowtodefault-features = false.
The public xlsx API is fully preserved — ExcelCell, ExcelCellFormat, NamedRange, and friends are all kept. This is a swap of the engine, not the interface.
Technical Deep Dive: A Pure Rust xlsx engine behind a preserved facade
1. The in-tree src/io/xlsx/ engine (OxiARC + quick-xml).
.xlsx is a ZIP container of XML parts. The old path used calamine (read) and simple_excel_writer (write), both of which lean on the zip/flate2/miniz_oxide C-flavored compression chain. We replaced both with a from-scratch reader/writer split across a new module — reader.rs, writer.rs, cell.rs, schema.rs, error.rs, mod.rs — built on oxiarc-archive for the Pure Rust ZIP layer and quick-xml for the XML. src/io/excel.rs is now a thin facade that forwards to crate::io::xlsx, so the public surface is unchanged. Advanced features behave as before; formula and named-range tracking is deferred to a follow-up. A new tests/excel_roundtrip_test.rs guards the write-then-read cycle. The net dependency change: oxiarc-archive 0.2.6 + quick-xml 0.39.2 added under excel; calamine and simple_excel_writer removed.
2. The -sys purge.
The dirs crate pulled dirs-sys for config-directory lookups. We replaced it with an inline user_config_dir() in src/config/loader.rs honoring XDG / macOS / Windows conventions, returning Option<PathBuf> with identical semantics. After this, cargo build --no-default-features has zero -sys crates outside the unavoidable OS-API set — the only survivor is core-foundation-sys, pulled by iana-time-zone/chrono for macOS timezone resolution, which is genuine OS FFI.
3. Pinning datafusion / parquet / arrow to drop C libs.
datafusion53.1.0 withdefault-features = falsedrops thecompressionfeature, eliminatingliblzma-sys(Clibxz) fromdistributed/flight/serving/all-features, plus thebzip2/async-compressionchain.parquet58.1.0 withdefault-features = falseand[arrow, snap, brotli, flate2-zlib-rs, lz4, base64, simdutf8]—flate2-zlib-rsselects the Pure Rustzlib-rs, notminiz_oxide— eliminateszstd-sysfrom--features stable.arrow58.1.0 withdefault-features = falseand[csv, ipc, json]guards against default drift re-introducingipc_compression→zstd-sys/lz4-sys.
User-visible DataFusion, Parquet, and Arrow APIs are unchanged.
4. Honest tech debt and intentional regressions.
We are not pretending this is free. Two regressions are deliberate under the Pure Rust policy: zstd-compressed Parquet is no longer readable on --features stable/parquet (Snappy — the pandas default — gzip, brotli, and lz4 still work), and DataFusion’s built-in xz/bz2/zstd auto-decompression for CSV/JSON readers is disabled on distributed/flight/serving (plain + gzip still work). And some feature-gated debt remains upstream: parquet/distributed/flight still transitively pull flate2/lz4_flex/snap/brotli/miniz_oxide via Arrow/Parquet/DataFusion; --features distributed/flight still pull zstd-sys/miniz_oxide because DataFusion 53.1.0’s own Cargo.toml hardcodes default-features = true on parquet (Cargo features are additive — we can’t suppress upstream; this needs an upstream fix); and cloud-storage pulls ring (C+asm) via object_store 0.13.2. The default build pulls none of these.
Getting Started
cargo add pandrs --features excel
use pandrs::{DataFrame, Series};
fn main() -> pandrs::error::Result<()> {
let mut df = DataFrame::new();
df.add_column(
"quarter".to_string(),
Series::from_vec(vec!["Q1", "Q2", "Q3", "Q4"], Some("quarter")),
)?;
df.add_column(
"revenue".to_string(),
Series::from_vec(vec![120.5, 138.2, 151.0, 169.8], Some("revenue")),
)?;
// xlsx I/O is now Pure Rust — backed by OxiARC, no zip/flate2/miniz_oxide
df.to_excel("report.xlsx", None)?;
let reloaded = DataFrame::from_excel("report.xlsx", None)?;
println!("round-tripped {} rows", reloaded.shape().0);
Ok(())
}
What’s New in 0.3.1
- Pure Rust Excel/xlsx (in-tree, OxiARC-backed): replaced
simple_excel_writer+calaminewith an in-tree xlsx reader/writer onoxiarc-archive+quick-xml; theexcelfeature no longer pullszip,flate2, orminiz_oxide. Public xlsx API fully preserved. Newsrc/io/xlsx/module;src/io/excel.rsis now a thin facade. New round-trip testtests/excel_roundtrip_test.rs. Addedoxiarc-archive0.2.6 +quick-xml0.39.2 underexcel; removedcalamine+simple_excel_writer. -syscrate cleanup: replaceddirswith an inlinestd::env-baseduser_config_dir()insrc/config/loader.rs(identical semantics) — removesdirs+dirs-sys.cargo build --no-default-featuresnow has zero-syscrates outside the unavoidable OS-FFI set (core-foundation-sysfor macOS timezone).- Pinned
datafusion53.1.0default-features = false— dropscompression, eliminatingliblzma-sys(Clibxz) and the bzip2/async-compression chain fromdistributed/flight/serving/all-features. - Pinned
parquet58.1.0default-features = falsewith[arrow, snap, brotli, flate2-zlib-rs, lz4, base64, simdutf8](Pure Rustzlib-rs) — eliminateszstd-sysfrom--features stable. - Pinned
arrow58.1.0default-features = falsewith[csv, ipc, json]to guard against default drift re-introducingipc_compression. - Dependency bumps:
scirs2-core/stats/linalg0.4.0 → 0.4.2;datafusion53.0.0 → 53.1.0;tokio1.50 → 1.52;rayon1.11.0 → 1.12.0;rand0.10.0 → 0.10.1;cranelift*0.130.0 → 0.130.1;uuid1.23.0 → 1.23.1;lru0.16.3 → 0.17.0;toml1.1.0 → 1.1.2;wasm-bindgen0.2.114 → 0.2.118;js-sys/web-sys0.3.91 → 0.3.95. - Fixed: pinned
sha2 = "0.10"so Cargo resolves adigest0.10.x shared withpbkdf2/aes-gcm(fixes a build error fromsha20.10.9’s digest-contract shift); intra-doc link fix insrc/io/excel.rs(rustdoc builds cleanly with-D warnings); refactored Excel I/O error handling and formatting (no behaviour change). - Intentional regressions (Pure Rust policy): zstd-compressed Parquet no longer readable on
--features stable/parquet(Snappy/gzip/brotli/lz4 still work); DataFusion’s built-in xz/bz2/zstd auto-decompression for CSV/JSON readers disabled ondistributed/flight/serving(plain + gzip still work). - Testing: 1809 tests passing (nextest,
--all-features) and 117 doc tests passing. Zero clippy warnings with-D warnings. Rustdoc builds cleanly with-D warnings.
Tips
- Turn on
excelwithout guilt. Now that the path is Pure Rust,cargo add pandrs --features excelno longer drags inzip/flate2/miniz_oxide. The public xlsx API (ExcelCell,ExcelCellFormat,NamedRange, …) is unchanged, so existing code keeps compiling. - The default build is
-sys-clean. Usecargo build --no-default-featuresfor the leanest, most portable, easiest-to-cross-compile PandRS — zero-syscrates outside the unavoidable macOS timezone FFI. - If you need zstd Parquet, plan around it. It’s intentionally dropped on
--features stablefor Pure Rust purity. Use Snappy (the pandas default), gzip, brotli, or lz4, or pre-decompress your.parquetbefore loading. cloud-storagestill pullsring(C+asm) viaobject_store0.13.2 — that’s an upstream blocker, not a default. Leave it off unless you actually need cloud object stores.- Enable
scirs2for the scientific stack — it now ridesscirs2-core/stats/linalg0.4.2, keeping PandRS aligned with NumRS2 and SciRS2. - Watch the feature surface.
--features distributed/flightstill transitively pull a few C/compression crates because DataFusion hardcodesdefault-features = trueonparquetupstream. If supply-chain purity is critical, stick to the default +parquet/stableset, which is clean.
This is the foundation
PandRS is the DataFrame layer of the COOLJAPAN scientific stack — it pairs with NumRS2 for arrays and SciRS2 for the broader scientific/AI primitives (now on 0.4.2). With 0.3.1, the data-loading floor of that stack leans on OxiARC for Pure Rust archive and compression — the same OxiARC ZIP and codec work that backs the rest of the ecosystem. Around it sit period-accurate siblings: OptiRS for optimization, SkleaRS for classical ML, TenfloweRS and TrustformeRS for deep learning, OxiMedia for media/CV, and the lower-level OxiFFT / OxiZ / OxiBLAS / OxiCode crates. The point is sovereignty: every layer compiles from source, with no C/Cython/bundled-codec baggage.
Repository: https://github.com/cool-japan/pandrs
Star the repo if you want a pandas-class DataFrame without the hidden C compression libs or the Python GIL.
The era of “pip install pandas” — dragging in zlib, xz, and zstd C libraries you never asked for — is ending.
Pure Rust DataFrames are here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ April 19, 2026