COOLJAPAN
← All posts

OxiH5 0.1.3 — A Pure Rust HDF5 Reader, No libhdf5 in Sight

OxiH5 is the COOLJAPAN Pure-Rust HDF5 reader — it parses real HDF5 files written by h5py and libhdf5 from raw bytes, with no hdf5-sys, no C libhdf5, and no unsafe in production paths. All 11 datatype classes, every chunk index, deflate/shuffle/fletcher32/szip/nbit/scaleoffset filters, mmap, and a NetCDF-4 reader on top — part of the NoFFI sovereign Rust stack.

release oxih5 pure-rust cooljapan noffi hdf5 scientific-data file-format io

Decades of scientific data live inside HDF5 files — and until now, opening one in Rust meant linking a C library.

Today we’re releasing OxiH5 0.1.3 — the COOLJAPAN Pure-Rust HDF5 reader, a library that parses real HDF5 files (the kind written by h5py and libhdf5) directly from their bytes, using nothing but std byte parsing.

No libhdf5. No FFI. No -sys crates. Just clean, memory-safe Rust that reads the binary HDF5 format byte by byte — #![forbid(unsafe_code)] on the core, a single static binary, no system libraries, and no C toolchain at build time.

The problem

HDF5 is the lingua franca of scientific computing. Climate models, particle-physics datasets, machine-learning checkpoints, satellite imagery, genomics — a staggering amount of the world’s structured numerical data sits in .h5 and NetCDF-4 files. But the format has historically had exactly one canonical implementation: the C library, libhdf5.

If you wanted to read an HDF5 file in Rust, you reached for hdf5-sys or hdf5, which means linking libhdf5 (and often netcdf-sys on top of it). That drags in a C dependency with its own build system, version-matching headaches, a native toolchain requirement, and an FFI boundary where memory-safety guarantees stop at the door. It does not cross-compile cleanly, it does not go to WASM, and it does not produce a single self-contained binary.

OxiH5 takes the other road. The HDF5 file format is a published specification, so OxiH5 implements a parser for it from scratch — the superblock, object headers, message lists, the various B-tree and array indices, heaps, filter pipelines, and every datatype class — entirely in safe Rust. The result reads files produced by real-world libhdf5 and h5py, with zero C anywhere in the default build.

A word on scope, because honesty matters: OxiH5 0.1.3 is primarily a reader. The read path is broad and battle-tested; there is also a minimal write path (more on that below), but the heart of this release — and where the coverage is deep — is reading.

What we built

OxiH5 is a four-crate workspace (~20.4k SLOC of Rust, with 459 unit and integration tests, all passing):

CratePurpose
oxih5-corePublic types: Dataset, Dtype, ByteOrder, OxiH5Error, Attribute, FilterPipeline, Link, Group. Carries #![forbid(unsafe_code)].
oxih5-formatThe low-level binary parsers: superblock, object headers, messages, heaps, B-tree v1/v2, SNOD, fractal heap, extensible/fixed array indices, filters, global heap, chunked assembly.
oxih5The user-facing facade: open(), open_mmap(), read_dataset(), File, Group, FileWriter.
oxinetcdfA Pure-Rust NetCDF-4 conventions reader/writer atop OxiH5: NcFile, NcGroup, NcVariable, NcDimension, NcFileWriter.

The parsing pipeline walks the file the same way libhdf5 does, one layer at a time:

1. Superblock. OxiH5 reads superblock v0 (libver='earliest') and v2/v3 (libver='latest') to find the root group.

2. Object headers. Both flavors are supported — v1 (message list with continuation blocks) and v2 (OHDR + OCHK, with creation-order, timestamps, and phase-change tracking).

3. Groups. Old-style groups (B-tree v1 + local heap + symbol-table nodes) and new-style groups (Link Info / Link messages, fractal heaps for large groups, and B-tree v2 name indices) both resolve correctly, so hierarchical traversal works regardless of how the file was written.

4. Data layouts. Contiguous, compact (inline) data, and chunked datasets — and for chunked data, all four chunk-index types: B-tree v1, B-tree v2, the extensible array index, and the fixed array index.

5. Filters. The chunked filter pipeline decodes Deflate/gzip (id 1, via oxiarc-deflate), Shuffle (id 2), Fletcher32 (id 3), SZIP/AEC (id 4, via oxiarc-szip behind the szip feature), Nbit (id 5, integer bit-packing), and Scaleoffset (id 6, integer precision reduction).

6. Datatypes. All eleven HDF5 datatype classes are parsed: fixed-point integers (i8 through u64, LE/BE), floats (f16/f32/f64, LE/BE), fixed-length strings (ASCII/UTF-8), bitfields, opaque blobs, compound types (named fields at offsets), references (object/region), enumerations (base type + member table), variable-length sequences (backed by the global heap), and arrays (base type + dimensions). Attributes (message type 0x000C, versions 1, 2, and 3) support every one of those classes too.

In keeping with COOLJAPAN policy, DEFLATE goes through oxiarc-deflate and SZIP through oxiarc-szip — never flate2, miniz, or zlib-ng. HDF5 FFI crates are banned workspace-wide via deny.toml.

Getting Started

cargo add oxih5

A minimal read, straight from the README:

use oxih5::{open, read_dataset};

// One-shot convenience
let ds = read_dataset("data.h5", "/temperature")?;
let values: Vec<f32> = ds.as_f32()?;
println!("shape: {:?}, {} elements", ds.shape, ds.len());

// File handle (for multiple datasets)
let f = open("data.h5")?;
for name in f.dataset_names()? {
    println!("{name}");
}
let ds = f.dataset("/pressure")?;
let values: Vec<f64> = ds.as_f64()?;

// Hierarchical groups
let grp = f.group("/sensors/imu")?;
let names = grp.datasets()?;
let ds = grp.dataset("accel_x")?;

// Multi-dimensional sub-region extraction
let region = f.dataset_slice("/image", &[100..200, 50..150])?;

// Memory-mapped I/O for large files
let f = oxih5::open_mmap("large_file.h5")?;

Highlights

Tips

Part of the COOLJAPAN ecosystem

OxiH5 belongs to NoFFI — the COOLJAPAN initiative to replace every C/C++/Fortran/-sys FFI dependency in the Rust ecosystem with a clean, memory-safe, 100% Pure Rust implementation. Each NoFFI project eliminates one specific native dependency; OxiH5’s job is to retire hdf5-sys / hdf5 (and netcdf-sys on the read path) so that reading scientific data no longer means linking a C library.

It sits naturally alongside the rest of the sovereign Rust stack — oxiarc-deflate and oxiarc-szip provide the Pure-Rust compression that HDF5’s filters depend on, and OxiH5 in turn becomes the data-loading layer for numerical and ML work across the COOLJAPAN ecosystem. Default features are 100% Pure Rust, no system libraries, no build-time C toolchain, and WASM-friendly.

Repository: https://github.com/cool-japan/oxih5

Star the repo ⭐ if you want to open HDF5 files in Rust without ever touching libhdf5 again.

Pure Rust scientific data — sovereign, safe, and FFI-free.

KitaSan at COOLJAPAN OÜ June 20, 2026

↑ Back to all posts