OxiFFT 0.3.1 Released — Winograd Mixed-Radix Meets FFTW-Style Auto-Tuning

The two superpowers that made FFTW legendary — minimum-multiply mixed-radix and plan-time auto-tuning — are now Pure Rust.

Today we released OxiFFT 0.3.1 — a focused feature release that teaches the planner to factor “ugly” composite sizes with Winograd butterflies and to profile candidate algorithms at runtime, recording the results as portable wisdom.

No C. No Fortran. No FFTW. No FFI. OxiFFT is a Pure Rust FFT/DFT library whose default features are 100% Rust — it compiles to a single static binary or to WASM with nothing to link against. As the rustfft replacement under the COOLJAPAN Pure Rust policy and a Pure Rust port of FFTW3, it is the spectral backbone for the SciRS2 signal and audio stack, and it displaces both FFTW3 and rustfft wherever they sit today.

Why 0.3.1 matters

For decades, two FFTW capabilities set it apart from naive radix-2 implementations: minimum-multiply mixed-radix transforms for sizes that do not factor neatly into powers of two, and the famous plan-MEASURE auto-tuning that records “wisdom” about the fastest algorithm for a given size and machine. With 0.3.1, both arrive in OxiFFT — in Pure Rust.

Concrete wins:

Smooth-7 sizes stop paying the Bluestein tax. Sizes that factor into {2, 3, 4, 5, 7, 8, 16} — 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, 240, and friends — now run through a mixed-radix Cooley-Tukey path with a proper cost model instead of the chirp-convolution detour.
Fewer multiplies. Winograd minimum-multiply radix-3/5/7 DIT butterflies do the work with the arithmetic FFTW made famous.
The planner can learn your machine. Flags::MEASURE and Flags::PATIENT profile candidate algorithms at runtime and persist the winner.
Wisdom is portable and compact. A 30-byte packed little-endian binary format, plus a human-readable S-expression v2 format that stays backward-compatible with v1 files.
No API churn. The improvements engage automatically for the affected sizes — existing code just gets faster.

Technical Deep Dive

Mixed-radix Cooley-Tukey

The headline addition is a mixed-radix Cooley-Tukey FFT for smooth-7 sizes — those that factor entirely into {2, 3, 4, 5, 7, 8, 16}. Previously, composite sizes that were not pure powers of two often fell back to Bluestein’s algorithm, which embeds the transform into a larger convolution with a chirp sequence. That works for any size, but it carries real overhead.

For smooth-7 sizes, 0.3.1 replaces Bluestein with a direct mixed-radix decomposition built from Winograd minimum-multiply radix-3/5/7 DIT butterflies, selected by a proper cost model that counts multiplies rather than guessing. The result: sizes like 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, and 240 run with fewer arithmetic operations and none of the chirp-convolution bookkeeping.

The machinery lives in oxifft/src/dft/codelets/winograd.rs, oxifft/src/dft/codelets/winograd_constants.rs, oxifft/src/dft/codelets/winograd_pfa.rs, and oxifft/src/dft/codelets/twiddle_odd.rs.

Auto-tuning and wisdom

The second pillar is FFTW-style auto-tuning. Flags::MEASURE and Flags::PATIENT now drive runtime profiling of candidate algorithms through auto_tune::tune_size<T> and tune_range<T> — the planner times the real contenders for a given size and keeps the fastest.

Those measurements become wisdom: a compact binary format of 30-byte packed little-endian entries that you can persist and reload. Wisdom format v2 adds a human-readable S-expression encoding with (mixed-radix-R1-R2-...) plan descriptions, and it reads v1 files without modification. Build-time profiling is opt-in via the OXIFFT_TUNE=1 environment variable, and a new oxifft_tune CLI binary handles offline profiling so you can tune once and ship the result.

See oxifft/src/api/plan/auto_tune.rs, oxifft/src/bin/oxifft_tune.rs, and the chirp-z support in oxifft/src/chirp_z/.

Codegen and integration

On the code-generation side, oxifft-codegen gains a gen_any_codelet! proc-macro and a CodeletBuilder API that dispatches to the right strategy — direct codelets, Rader, MixedRadix, or Bluestein — for any user-specified N. With this addition the crate now exposes 11 proc-macros.

This release also lands an opt-in ndarray integration in oxifft/src/integrations/ndarray_ext.rs, so transforms compose with ndarray arrays. It is a separate integration module — enable it when you want it.

Getting Started

Add the crate:

cargo add oxifft

Then plan and execute a transform on a smooth-7 size, letting the planner measure as it goes:

use oxifft::{Complex, Direction, Flags, Plan};

// A "smooth-7" size (240 = 16·3·5) — now mixed-radix, not Bluestein.
// Flags::MEASURE profiles candidate algorithms and records wisdom.
let plan = Plan::dft_1d(240, Direction::Forward, Flags::MEASURE)
    .expect("240-pt plan");
let input = vec![Complex::new(1.0_f64, 0.0); 240];
let mut output = vec![Complex::new(0.0_f64, 0.0); 240];
plan.execute(&input, &mut output);

To profile offline and bake the wisdom into your build, run the CLI (or set OXIFFT_TUNE=1 at build time):

OXIFFT_TUNE=1 cargo run --bin oxifft_tune

What’s New in 0.3.1

Mixed-radix Cooley-Tukey FFT for smooth-7 sizes factoring into {2, 3, 4, 5, 7, 8, 16} — 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, 240, and more — using Winograd minimum-multiply radix-3/5/7 DIT butterflies, replacing Bluestein for these sizes with a proper cost model.
Auto-tuning via Flags::MEASURE and Flags::PATIENT, with runtime profiling through auto_tune::tune_size<T> / tune_range<T>, a 30-byte packed little-endian binary wisdom format, build-time profiling opt-in via OXIFFT_TUNE=1, and a new oxifft_tune CLI binary.
gen_any_codelet! proc-macro and a CodeletBuilder API in oxifft-codegen that dispatches to direct codelets / Rader / MixedRadix / Bluestein for any N; the crate now has 11 proc-macros.
Wisdom format v2 — an S-expression format with (mixed-radix-R1-R2-...) encoding, backward-compatible with v1 files.
325 new tests bring the workspace to 1,554 passing, up from the previous 0.3.0 series.

Tips

Tune your hot sizes. Use Flags::MEASURE for sizes on your critical path, or Flags::PATIENT for a more thorough search, then persist the resulting wisdom so you only pay the profiling cost once.
Trust the new default path. Smooth-7 sizes like 240, 112, and 96 now avoid Bluestein automatically — there is no API change to make, you simply get the faster route.
Profile on the target CPU. Run oxifft_tune once on the machine you deploy to and ship the wisdom file alongside your binary:

OXIFFT_TUNE=1 cargo run --bin oxifft_tune

Reach for gen_any_codelet! when you want a specialized codelet for an exotic N that the standard dispatch does not already cover.
Enable the ndarray integration (integrations/ndarray_ext.rs) when your data already lives in ndarray arrays and you want FFTs to compose with it.

The foundation

OxiFFT is the spectral layer of the COOLJAPAN ecosystem. By early May 2026 it sits beside mature siblings such as SciRS2, NumRS2, OxiBLAS, OxiCUDA, ToRSh, OxiWhisper, SkleaRS, TenfloweRS, and TrustformeRS — the same scientific and ML stack that depends on fast, correct transforms. The new ndarray bridge makes OxiFFT composable with the array layer of that stack, so spectral work slots in wherever the data already lives.

Repository: https://github.com/cool-japan/oxifft

Star the repo if you want FFTW’s classic strengths without FFTW’s C — and follow along as the planner keeps getting smarter. Pure Rust spectral computing — fast, safe, and self-tuning.

— KitaSan at COOLJAPAN OÜ May 2, 2026