Decades of scientific data live inside HDF5 files — and until now, opening one in Rust meant linking a C library.
Today we’re releasing OxiH5 0.1.3 — the COOLJAPAN Pure-Rust HDF5 reader, a library that parses real HDF5 files (the kind written by h5py and libhdf5) directly from their bytes, using nothing but std byte parsing.
No libhdf5. No FFI. No -sys crates. Just clean, memory-safe Rust that reads the binary HDF5 format byte by byte — #![forbid(unsafe_code)] on the core, a single static binary, no system libraries, and no C toolchain at build time.
The problem
HDF5 is the lingua franca of scientific computing. Climate models, particle-physics datasets, machine-learning checkpoints, satellite imagery, genomics — a staggering amount of the world’s structured numerical data sits in .h5 and NetCDF-4 files. But the format has historically had exactly one canonical implementation: the C library, libhdf5.
If you wanted to read an HDF5 file in Rust, you reached for hdf5-sys or hdf5, which means linking libhdf5 (and often netcdf-sys on top of it). That drags in a C dependency with its own build system, version-matching headaches, a native toolchain requirement, and an FFI boundary where memory-safety guarantees stop at the door. It does not cross-compile cleanly, it does not go to WASM, and it does not produce a single self-contained binary.
OxiH5 takes the other road. The HDF5 file format is a published specification, so OxiH5 implements a parser for it from scratch — the superblock, object headers, message lists, the various B-tree and array indices, heaps, filter pipelines, and every datatype class — entirely in safe Rust. The result reads files produced by real-world libhdf5 and h5py, with zero C anywhere in the default build.
A word on scope, because honesty matters: OxiH5 0.1.3 is primarily a reader. The read path is broad and battle-tested; there is also a minimal write path (more on that below), but the heart of this release — and where the coverage is deep — is reading.
What we built
OxiH5 is a four-crate workspace (~20.4k SLOC of Rust, with 459 unit and integration tests, all passing):
| Crate | Purpose |
|---|---|
oxih5-core | Public types: Dataset, Dtype, ByteOrder, OxiH5Error, Attribute, FilterPipeline, Link, Group. Carries #![forbid(unsafe_code)]. |
oxih5-format | The low-level binary parsers: superblock, object headers, messages, heaps, B-tree v1/v2, SNOD, fractal heap, extensible/fixed array indices, filters, global heap, chunked assembly. |
oxih5 | The user-facing facade: open(), open_mmap(), read_dataset(), File, Group, FileWriter. |
oxinetcdf | A Pure-Rust NetCDF-4 conventions reader/writer atop OxiH5: NcFile, NcGroup, NcVariable, NcDimension, NcFileWriter. |
The parsing pipeline walks the file the same way libhdf5 does, one layer at a time:
1. Superblock. OxiH5 reads superblock v0 (libver='earliest') and v2/v3 (libver='latest') to find the root group.
2. Object headers. Both flavors are supported — v1 (message list with continuation blocks) and v2 (OHDR + OCHK, with creation-order, timestamps, and phase-change tracking).
3. Groups. Old-style groups (B-tree v1 + local heap + symbol-table nodes) and new-style groups (Link Info / Link messages, fractal heaps for large groups, and B-tree v2 name indices) both resolve correctly, so hierarchical traversal works regardless of how the file was written.
4. Data layouts. Contiguous, compact (inline) data, and chunked datasets — and for chunked data, all four chunk-index types: B-tree v1, B-tree v2, the extensible array index, and the fixed array index.
5. Filters. The chunked filter pipeline decodes Deflate/gzip (id 1, via oxiarc-deflate), Shuffle (id 2), Fletcher32 (id 3), SZIP/AEC (id 4, via oxiarc-szip behind the szip feature), Nbit (id 5, integer bit-packing), and Scaleoffset (id 6, integer precision reduction).
6. Datatypes. All eleven HDF5 datatype classes are parsed: fixed-point integers (i8 through u64, LE/BE), floats (f16/f32/f64, LE/BE), fixed-length strings (ASCII/UTF-8), bitfields, opaque blobs, compound types (named fields at offsets), references (object/region), enumerations (base type + member table), variable-length sequences (backed by the global heap), and arrays (base type + dimensions). Attributes (message type 0x000C, versions 1, 2, and 3) support every one of those classes too.
In keeping with COOLJAPAN policy, DEFLATE goes through oxiarc-deflate and SZIP through oxiarc-szip — never flate2, miniz, or zlib-ng. HDF5 FFI crates are banned workspace-wide via deny.toml.
Getting Started
cargo add oxih5
A minimal read, straight from the README:
use oxih5::{open, read_dataset};
// One-shot convenience
let ds = read_dataset("data.h5", "/temperature")?;
let values: Vec<f32> = ds.as_f32()?;
println!("shape: {:?}, {} elements", ds.shape, ds.len());
// File handle (for multiple datasets)
let f = open("data.h5")?;
for name in f.dataset_names()? {
println!("{name}");
}
let ds = f.dataset("/pressure")?;
let values: Vec<f64> = ds.as_f64()?;
// Hierarchical groups
let grp = f.group("/sensors/imu")?;
let names = grp.datasets()?;
let ds = grp.dataset("accel_x")?;
// Multi-dimensional sub-region extraction
let region = f.dataset_slice("/image", &[100..200, 50..150])?;
// Memory-mapped I/O for large files
let f = oxih5::open_mmap("large_file.h5")?;
Highlights
- Reads real HDF5 files — superblock v0/v2/v3, object headers v1/v2, old- and new-style groups, contiguous/compact/chunked layouts, all from raw bytes.
- All four chunk indices — B-tree v1, B-tree v2, extensible array, and fixed array.
- Full filter pipeline — deflate, shuffle, fletcher32, szip, nbit, and scaleoffset.
- All 11 datatype classes — integers, floats (including f16), strings, bitfields, opaque, compound, reference, enum, variable-length, and array, in attributes as well as datasets.
- Memory-mapped I/O —
open_mmap(path)pages in only the regions you touch, so opening a 1 GB file is essentially free. - Dataset utilities —
Dataset::slice(&ranges)for multi-dimensional sub-regions,Dataset::reshape(&shape)for zero-copy shape reinterpretation, and lazy per-type iterators (iter_f32,iter_f64,iter_i32,iter_u8,iter_f16, and the rest of the integer widths). - NetCDF-4 reader —
oxinetcdfreads NetCDF-4 conventions (NcFile,NcVariable,NcDimension) atop the same Pure-Rust core, replacingnetcdf-syson the read path. - A minimal write path —
FileWriterproduces valid HDF5 files readable by h5py and libhdf5 for flat contiguous datasets (float32, float64, int32, uint8), withoxinetcdf’sNcFileWriteremitting NetCDF-4-compliant files with full DIMENSION_SCALE encoding.
Tips
- Use
open_mmapfor large files.oxih5::open_mmap(path)(orFile::open_mmap(path)) lets the OS demand-page the file, so you only pay for the bytes you actually read. This is the one place OxiH5 usesunsafe— for the mmap call itself — and it is documented as such; everything else stays in safe Rust. - Enable the
ndarrayfeature to getDataset::to_array_f32,to_array_f64, andto_array_i32, which hand you back anndarray::ArrayD<T>ready for numerical work. - Enable the
parallelfeature to decompress chunked datasets concurrently across cores via Rayon — handy for large, heavily-filtered arrays. - Enable the
szipfeature when you need SZIP/AEC-compressed data (filter id 4); it pulls inoxiarc-szip. The other five filters work out of the box. - Reach for the lazy iterators (
iter_f32and friends) when you want to stream a dataset element by element instead of materializing the whole thing into aVec. - Slice before you read with
dataset_slice("/path", &[r0, r1, ...])to pull a hyperslab out of a large dataset without loading the rest.
Part of the COOLJAPAN ecosystem
OxiH5 belongs to NoFFI — the COOLJAPAN initiative to replace every C/C++/Fortran/-sys FFI dependency in the Rust ecosystem with a clean, memory-safe, 100% Pure Rust implementation. Each NoFFI project eliminates one specific native dependency; OxiH5’s job is to retire hdf5-sys / hdf5 (and netcdf-sys on the read path) so that reading scientific data no longer means linking a C library.
It sits naturally alongside the rest of the sovereign Rust stack — oxiarc-deflate and oxiarc-szip provide the Pure-Rust compression that HDF5’s filters depend on, and OxiH5 in turn becomes the data-loading layer for numerical and ML work across the COOLJAPAN ecosystem. Default features are 100% Pure Rust, no system libraries, no build-time C toolchain, and WASM-friendly.
Repository: https://github.com/cool-japan/oxih5
Star the repo ⭐ if you want to open HDF5 files in Rust without ever touching libhdf5 again.
Pure Rust scientific data — sovereign, safe, and FFI-free.
— KitaSan at COOLJAPAN OÜ June 20, 2026