COOLJAPAN
← All posts

OxiFFT 0.3.1 Released — Winograd Mixed-Radix Meets FFTW-Style Auto-Tuning

Pure Rust FFT and the rustfft replacement: OxiFFT 0.3.1 adds Winograd mixed-radix Cooley-Tukey for smooth-7 sizes (retiring Bluestein for them), FFTW-style MEASURE/PATIENT auto-tuning with a binary wisdom format and an oxifft_tune CLI, plus an opt-in ndarray integration.

release oxifft fft fftw winograd autotuning rust rustfft

The two superpowers that made FFTW legendary — minimum-multiply mixed-radix and plan-time auto-tuning — are now Pure Rust.

Today we released OxiFFT 0.3.1 — a focused feature release that teaches the planner to factor “ugly” composite sizes with Winograd butterflies and to profile candidate algorithms at runtime, recording the results as portable wisdom.

No C. No Fortran. No FFTW. No FFI. OxiFFT is a Pure Rust FFT/DFT library whose default features are 100% Rust — it compiles to a single static binary or to WASM with nothing to link against. As the rustfft replacement under the COOLJAPAN Pure Rust policy and a Pure Rust port of FFTW3, it is the spectral backbone for the SciRS2 signal and audio stack, and it displaces both FFTW3 and rustfft wherever they sit today.

Why 0.3.1 matters

For decades, two FFTW capabilities set it apart from naive radix-2 implementations: minimum-multiply mixed-radix transforms for sizes that do not factor neatly into powers of two, and the famous plan-MEASURE auto-tuning that records “wisdom” about the fastest algorithm for a given size and machine. With 0.3.1, both arrive in OxiFFT — in Pure Rust.

Concrete wins:

Technical Deep Dive

Mixed-radix Cooley-Tukey

The headline addition is a mixed-radix Cooley-Tukey FFT for smooth-7 sizes — those that factor entirely into {2, 3, 4, 5, 7, 8, 16}. Previously, composite sizes that were not pure powers of two often fell back to Bluestein’s algorithm, which embeds the transform into a larger convolution with a chirp sequence. That works for any size, but it carries real overhead.

For smooth-7 sizes, 0.3.1 replaces Bluestein with a direct mixed-radix decomposition built from Winograd minimum-multiply radix-3/5/7 DIT butterflies, selected by a proper cost model that counts multiplies rather than guessing. The result: sizes like 6, 10, 12, 14, 24, 28, 40, 56, 80, 96, 112, and 240 run with fewer arithmetic operations and none of the chirp-convolution bookkeeping.

The machinery lives in oxifft/src/dft/codelets/winograd.rs, oxifft/src/dft/codelets/winograd_constants.rs, oxifft/src/dft/codelets/winograd_pfa.rs, and oxifft/src/dft/codelets/twiddle_odd.rs.

Auto-tuning and wisdom

The second pillar is FFTW-style auto-tuning. Flags::MEASURE and Flags::PATIENT now drive runtime profiling of candidate algorithms through auto_tune::tune_size<T> and tune_range<T> — the planner times the real contenders for a given size and keeps the fastest.

Those measurements become wisdom: a compact binary format of 30-byte packed little-endian entries that you can persist and reload. Wisdom format v2 adds a human-readable S-expression encoding with (mixed-radix-R1-R2-...) plan descriptions, and it reads v1 files without modification. Build-time profiling is opt-in via the OXIFFT_TUNE=1 environment variable, and a new oxifft_tune CLI binary handles offline profiling so you can tune once and ship the result.

See oxifft/src/api/plan/auto_tune.rs, oxifft/src/bin/oxifft_tune.rs, and the chirp-z support in oxifft/src/chirp_z/.

Codegen and integration

On the code-generation side, oxifft-codegen gains a gen_any_codelet! proc-macro and a CodeletBuilder API that dispatches to the right strategy — direct codelets, Rader, MixedRadix, or Bluestein — for any user-specified N. With this addition the crate now exposes 11 proc-macros.

This release also lands an opt-in ndarray integration in oxifft/src/integrations/ndarray_ext.rs, so transforms compose with ndarray arrays. It is a separate integration module — enable it when you want it.

Getting Started

Add the crate:

cargo add oxifft

Then plan and execute a transform on a smooth-7 size, letting the planner measure as it goes:

use oxifft::{Complex, Direction, Flags, Plan};

// A "smooth-7" size (240 = 16·3·5) — now mixed-radix, not Bluestein.
// Flags::MEASURE profiles candidate algorithms and records wisdom.
let plan = Plan::dft_1d(240, Direction::Forward, Flags::MEASURE)
    .expect("240-pt plan");
let input = vec![Complex::new(1.0_f64, 0.0); 240];
let mut output = vec![Complex::new(0.0_f64, 0.0); 240];
plan.execute(&input, &mut output);

To profile offline and bake the wisdom into your build, run the CLI (or set OXIFFT_TUNE=1 at build time):

OXIFFT_TUNE=1 cargo run --bin oxifft_tune

What’s New in 0.3.1

Tips

OXIFFT_TUNE=1 cargo run --bin oxifft_tune

The foundation

OxiFFT is the spectral layer of the COOLJAPAN ecosystem. By early May 2026 it sits beside mature siblings such as SciRS2, NumRS2, OxiBLAS, OxiCUDA, ToRSh, OxiWhisper, SkleaRS, TenfloweRS, and TrustformeRS — the same scientific and ML stack that depends on fast, correct transforms. The new ndarray bridge makes OxiFFT composable with the array layer of that stack, so spectral work slots in wherever the data already lives.

Repository: https://github.com/cool-japan/oxifft

Star the repo if you want FFTW’s classic strengths without FFTW’s C — and follow along as the planner keeps getting smarter. Pure Rust spectral computing — fast, safe, and self-tuning.

KitaSan at COOLJAPAN OÜ May 2, 2026

↑ Back to all posts