Distributed 3D FFT now scales across many ranks — and the default build stays warning-free on the latest stable Rust.
Today we released OxiFFT 0.3.2 — a hardening release that finishes multi-rank 3D pencil FFT execution and gates AVX-512 codelets behind a default-off feature so OxiFFT keeps building cleanly on rustc 1.95 stable.
No C. No Fortran. No FFTW. No FFI. OxiFFT is Pure Rust to the core, with default features that are 100% Rust — it compiles to a single static binary or to WASM, and displaces FFTW3 and rustfft for in-process transforms (and FFTW-MPI for distributed ones). The spectral backbone for the SciRS2 signal and audio stack carries no native baggage.
Why 0.3.2 matters
This is a maintenance and hardening release, focused on finishing distributed 3D FFT at scale and keeping the default build pristine on the newest stable toolchain.
- Multi-rank 3D pencil FFT is complete —
plan_3d_pencilnow runs full forward and inverse pencil decomposition across multiple MPI ranks. - The default build is warning-free on rustc 1.95 — AVX-512 codelets moved behind a new opt-in feature so the stable toolchain never trips over unstable target features.
- ND-plan error handling is hardened —
plan_nderror paths are expanded, and the row/column pool error handling in the pencil path was refactored. - The GPU backend is refreshed — the optional OxiCUDA dependency tracks the latest 0.1.8 release.
Technical Deep Dive
Multi-rank 3D pencil FFT
plan_3d_pencil now supports multi-rank MPI execution with full forward/inverse pencil decomposition. Where a slab decomposition splits the volume along a single axis — capping the usable rank count at the number of slabs — pencil decomposition gives each rank a “pencil” through the volume, so 3D transforms scale to far higher rank counts before communication dominates.
Alongside the executor work, plan_nd error handling for ND FFT plans was expanded, and the error handling for the row and column pools used in the pencil path was refactored into cleaner, more predictable paths. The relevant code lives in oxifft/src/mpi/plans/plan_3d_pencil.rs and oxifft/src/mpi/plans/plan_nd.rs.
AVX-512 feature gate
rustc 1.95 stable treats #[target_feature(enable = "avx512*")] as unstable (rust-lang/rust#44839). To keep the default build clean, AVX-512 codelets and their runtime dispatchers are now gated behind a new default-off avx512 feature.
Without --features avx512, builds simply fall through to the existing AVX-2 / SSE / scalar dispatch paths — so the default build stays warning-free on stable, with no loss of the fast path that most CPUs actually use. There is no API or ABI change when the feature is enabled; both oxifft and oxifft-codegen-impl were updated in concert. The gating touches oxifft/src/dft/codelets/simd/mod.rs and oxifft-codegen-impl/src/gen_simd/avx512.rs.
GPU backend refresh
The optional OxiCUDA GPU backend was bumped 0.1.4 → 0.1.8 across four incremental updates (0.1.5, 0.1.6, 0.1.7, 0.1.8). This is a straightforward dependency refresh of the GPU stack — no changes to the OxiFFT API surface.
Getting Started
cargo add oxifft
A minimal forward FFT:
use oxifft::{Complex, Direction, Flags, Plan};
let plan = Plan::dft_1d(1024, Direction::Forward, Flags::MEASURE)
.expect("1024-pt plan");
let input = vec![Complex::new(1.0_f64, 0.0); 1024];
let mut output = vec![Complex::new(0.0_f64, 0.0); 1024];
plan.execute(&input, &mut output);
AVX-512 is opt-in. On a toolchain where AVX-512 target features are allowed (nightly or otherwise unstable-friendly), enable it explicitly:
cargo add oxifft --features avx512
What’s New in 0.3.2
- Added: Multi-rank 3D pencil FFT execution —
plan_3d_pencilnow supports multi-rank MPI execution with full forward/inverse pencil decomposition. Expandedplan_nderror handling for ND FFT plans. - Changed: AVX-512 codelets and dispatchers gated behind a new default-off
avx512feature, because rustc 1.95 stable treats#[target_feature(enable = "avx512*")]as unstable (rust-lang/rust#44839). Without the feature, builds use the existing AVX-2 / SSE / scalar paths. No API or ABI change when enabled; bothoxifftandoxifft-codegen-implupdated together. - Changed: Refactored error handling for the row and column pools in multi-rank 3D pencil FFT execution.
- Dependencies:
oxicuda(optional GPU backend) bumped 0.1.4 → 0.1.8.
Tips
- Most users need to do nothing. The default build is unchanged and warning-free on rustc 1.95 — the AVX-2 / SSE / scalar dispatch already gives you the fast path on common hardware.
- Treat
avx512as opt-in. Enable it only on a toolchain where AVX-512 target features are allowed, and benchmark before committing — AVX-2 remains the default fast path, and it is fast. - Prefer pencil over slab for large 3D transforms. When spreading a 3D FFT across many MPI ranks, reach for
plan_3d_pencil— the per-rank pencil keeps you scaling well past where a slab decomposition saturates. - Refresh the GPU stack. If you use the GPU feature, move
oxicudato 0.1.8 to match this release.
The foundation
OxiFFT is the spectral layer of the COOLJAPAN ecosystem. By late May 2026 it sits beside mature siblings — SciRS2, NumRS2, OxiBLAS, OxiCUDA (its GPU backend), ToRSh, OxiWhisper, SkleaRS, TenfloweRS, TrustformeRS, and OxiPhysics — every transform staying Pure Rust from a single laptop to an MPI cluster. The OxiCUDA 0.1.8 dependency in this release is exactly that integration in practice: an optional, Pure-Rust-by-default GPU path that you bring in only when you want it.
Repository: https://github.com/cool-japan/oxifft
Star the repo if Pure Rust spectral computing belongs in your stack — and tell us how OxiFFT scales on your cluster.
Pure Rust spectral computing — fast, safe, and sovereign, from a laptop to an MPI cluster.
— KitaSan at COOLJAPAN OÜ May 22, 2026