The two superpowers that made FFTW legendary — minimum-multiply mixed-radix and plan-time auto-tuning — are now Pure Rust.
Today we released OxiFFT 0.3.1 — a focused feature release that teaches the planner to factor “ugly” composite sizes with Winograd butterflies and to profile candidate algorithms at runtime, recording the results as portable wisdom.
No C. No Fortran. No FFTW. No FFI. OxiFFT is a Pure Rust FFT/DFT library whose default features are 100% Rust — it compiles to a single static binary or to WASM with nothing to link against. As the rustfft replacement under the COOLJAPAN Pure Rust policy and a Pure Rust port of FFTW3, it is the spectral backbone for the SciRS2 signal and audio stack, and it displaces both FFTW3 and rustfft wherever they sit today.
Why 0.3.1 matters
For decades, two FFTW capabilities set it apart from naive radix-2 implementations: minimum-multiply mixed-radix transforms for sizes that do not factor neatly into powers of two, and the famous plan-MEASURE auto-tuning that records “wisdom” about the fastest algorithm for a given size and machine. With 0.3.1, both arrive in OxiFFT — in Pure Rust.
Concrete wins:
- Smooth-7 sizes stop paying the Bluestein tax. Sizes that factor into {2, 3, 4, 5, 7, 8, 16} — 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, 240, and friends — now run through a mixed-radix Cooley-Tukey path with a proper cost model instead of the chirp-convolution detour.
- Fewer multiplies. Winograd minimum-multiply radix-3/5/7 DIT butterflies do the work with the arithmetic FFTW made famous.
- The planner can learn your machine.
Flags::MEASUREandFlags::PATIENTprofile candidate algorithms at runtime and persist the winner. - Wisdom is portable and compact. A 30-byte packed little-endian binary format, plus a human-readable S-expression v2 format that stays backward-compatible with v1 files.
- No API churn. The improvements engage automatically for the affected sizes — existing code just gets faster.
Technical Deep Dive
Mixed-radix Cooley-Tukey
The headline addition is a mixed-radix Cooley-Tukey FFT for smooth-7 sizes — those that factor entirely into {2, 3, 4, 5, 7, 8, 16}. Previously, composite sizes that were not pure powers of two often fell back to Bluestein’s algorithm, which embeds the transform into a larger convolution with a chirp sequence. That works for any size, but it carries real overhead.
For smooth-7 sizes, 0.3.1 replaces Bluestein with a direct mixed-radix decomposition built from Winograd minimum-multiply radix-3/5/7 DIT butterflies, selected by a proper cost model that counts multiplies rather than guessing. The result: sizes like 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, and 240 run with fewer arithmetic operations and none of the chirp-convolution bookkeeping.
The machinery lives in oxifft/src/dft/codelets/winograd.rs, oxifft/src/dft/codelets/winograd_constants.rs, oxifft/src/dft/codelets/winograd_pfa.rs, and oxifft/src/dft/codelets/twiddle_odd.rs.
Auto-tuning and wisdom
The second pillar is FFTW-style auto-tuning. Flags::MEASURE and Flags::PATIENT now drive runtime profiling of candidate algorithms through auto_tune::tune_size<T> and tune_range<T> — the planner times the real contenders for a given size and keeps the fastest.
Those measurements become wisdom: a compact binary format of 30-byte packed little-endian entries that you can persist and reload. Wisdom format v2 adds a human-readable S-expression encoding with (mixed-radix-R1-R2-...) plan descriptions, and it reads v1 files without modification. Build-time profiling is opt-in via the OXIFFT_TUNE=1 environment variable, and a new oxifft_tune CLI binary handles offline profiling so you can tune once and ship the result.
See oxifft/src/api/plan/auto_tune.rs, oxifft/src/bin/oxifft_tune.rs, and the chirp-z support in oxifft/src/chirp_z/.
Codegen and integration
On the code-generation side, oxifft-codegen gains a gen_any_codelet! proc-macro and a CodeletBuilder API that dispatches to the right strategy — direct codelets, Rader, MixedRadix, or Bluestein — for any user-specified N. With this addition the crate now exposes 11 proc-macros.
This release also lands an opt-in ndarray integration in oxifft/src/integrations/ndarray_ext.rs, so transforms compose with ndarray arrays. It is a separate integration module — enable it when you want it.
Getting Started
Add the crate:
cargo add oxifft
Then plan and execute a transform on a smooth-7 size, letting the planner measure as it goes:
use oxifft::{Complex, Direction, Flags, Plan};
// A "smooth-7" size (240 = 16·3·5) — now mixed-radix, not Bluestein.
// Flags::MEASURE profiles candidate algorithms and records wisdom.
let plan = Plan::dft_1d(240, Direction::Forward, Flags::MEASURE)
.expect("240-pt plan");
let input = vec![Complex::new(1.0_f64, 0.0); 240];
let mut output = vec![Complex::new(0.0_f64, 0.0); 240];
plan.execute(&input, &mut output);
To profile offline and bake the wisdom into your build, run the CLI (or set OXIFFT_TUNE=1 at build time):
OXIFFT_TUNE=1 cargo run --bin oxifft_tune
What’s New in 0.3.1
- Mixed-radix Cooley-Tukey FFT for smooth-7 sizes factoring into {2, 3, 4, 5, 7, 8, 16} — 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, 240, and more — using Winograd minimum-multiply radix-3/5/7 DIT butterflies, replacing Bluestein for these sizes with a proper cost model.
- Auto-tuning via
Flags::MEASUREandFlags::PATIENT, with runtime profiling throughauto_tune::tune_size<T>/tune_range<T>, a 30-byte packed little-endian binary wisdom format, build-time profiling opt-in viaOXIFFT_TUNE=1, and a newoxifft_tuneCLI binary. gen_any_codelet!proc-macro and aCodeletBuilderAPI inoxifft-codegenthat dispatches to direct codelets / Rader / MixedRadix / Bluestein for any N; the crate now has 11 proc-macros.- Wisdom format v2 — an S-expression format with
(mixed-radix-R1-R2-...)encoding, backward-compatible with v1 files. - 325 new tests bring the workspace to 1,554 passing, up from the previous 0.3.0 series.
Tips
- Tune your hot sizes. Use
Flags::MEASUREfor sizes on your critical path, orFlags::PATIENTfor a more thorough search, then persist the resulting wisdom so you only pay the profiling cost once. - Trust the new default path. Smooth-7 sizes like 240, 112, and 96 now avoid Bluestein automatically — there is no API change to make, you simply get the faster route.
- Profile on the target CPU. Run
oxifft_tuneonce on the machine you deploy to and ship the wisdom file alongside your binary:
OXIFFT_TUNE=1 cargo run --bin oxifft_tune
- Reach for
gen_any_codelet!when you want a specialized codelet for an exotic N that the standard dispatch does not already cover. - Enable the ndarray integration (
integrations/ndarray_ext.rs) when your data already lives inndarrayarrays and you want FFTs to compose with it.
The foundation
OxiFFT is the spectral layer of the COOLJAPAN ecosystem. By early May 2026 it sits beside mature siblings such as SciRS2, NumRS2, OxiBLAS, OxiCUDA, ToRSh, OxiWhisper, SkleaRS, TenfloweRS, and TrustformeRS — the same scientific and ML stack that depends on fast, correct transforms. The new ndarray bridge makes OxiFFT composable with the array layer of that stack, so spectral work slots in wherever the data already lives.
Repository: https://github.com/cool-japan/oxifft
Star the repo if you want FFTW’s classic strengths without FFTW’s C — and follow along as the planner keeps getting smarter. Pure Rust spectral computing — fast, safe, and self-tuning.
— KitaSan at COOLJAPAN OÜ May 2, 2026