COOLJAPAN
← All posts

OxiMedia 0.1.8 Released — Auto-Captioning, Filmic Tone Mapping, and Zero-Copy CMAF

OxiMedia 0.1.8 — patent-free, memory-safe FFmpeg + OpenCV replacement in pure Rust. A feature-rich release: a Whisper-compatible AutoCaption ONNX pipeline, mmap-backed file repair, log-mel spectrograms, filmic ACES/Reinhard tone curves, and zero-copy CMAF segment output via bytes::Bytes. 109 crates, 100,278 tests, zero C/Fortran in default builds.

release oximedia ffmpeg opencv auto-captioning onnx hdr tone-mapping pure-rust cmaf

The pure-Rust FFmpeg + OpenCV replacement just learned to caption video, tone-map like a film, and ship CMAF segments with zero copies.

Today we released OxiMedia 0.1.8 — the patent-free, memory-safe reconstruction of FFmpeg (multimedia processing) and OpenCV (computer vision), unified in a single pure-Rust framework.

No C. No C++. No FFmpeg binaries. No OpenCV Python bindings. No system libraries, no patent royalties. OxiMedia compiles to a single static binary (or to wasm32-unknown-unknown for the browser) and runs everywhere with one cargo add.

Why OxiMedia 0.1.8 is a game changer

FFmpeg and OpenCV gave the world media and vision tooling — at the cost of C/C++ memory unsafety, patent-encumbered codecs, and build systems that demand a dozen system libraries. 0.1.8 keeps OxiMedia’s pure-Rust promise while adding capabilities that normally pull in heavyweight native runtimes:

At 0.1.8 the workspace is 109 crates and ~2.75M lines of pure Rust, with 100,278 tests passing (0 failures, 0 warnings, cargo nextest run --workspace --all-features).

Technical Deep Dive: how the new layers fit

  1. ML & Inference (oximedia-ml, oximedia-neural) — the AutoCaptionPipeline rides the typed-pipeline layer atop OxiONNX. oximedia-neural adds an onnx-gated OnnxBackend (load, run over a HashMap<String, Tensor> API). The companion oxionnx crate (bumped 0.1.2 → 0.1.3) adds SessionBuilder::with_provider_kinds() for typed runtime execution-provider selection and a ProviderKind::DirectMl variant, so the EP dispatch chain consults the provider priority list at runtime.

  2. Audio (oximedia-audio) — a compute_log_mel_spectrogram lands in the spectrum module: STFT → Hann window → MelScale filterbank → log, the standard front end for speech and music models (and the AutoCaption encoder).

  3. Color & HDR (oximedia-colormgmt, oximedia-hdr) — the new ToneCurve operators sit alongside a process-wide GamutConversionMatrix cache in oximedia-hdr (OnceLock<RwLock<HashMap<(ColorGamut, ColorGamut), [[f32;3];3]>>>) that eliminates redundant Bradford chromatic-adaptation and matrix-inverse work per gamut pair.

  4. Streaming & Repair (oximedia-stream, oximedia-repair) — zero-copy CMAF via bytes::Bytes, six new SpliceInfoSection encode→parse→re-encode roundtrip tests, and a real, dispatcher-driven repair engine backed by memory-mapped scanning.

Workspace guarantees hold: unsafe_code = "deny", single-binary deployment, WASM + WebGPU support, and zero C/Fortran in default features. All inference is opt-in — the default oximedia build links zero ONNX symbols.

Getting Started

cargo add oximedia

Auto-captioning a clip with the Whisper-compatible pipeline (enable the auto-caption feature):

[dependencies]
oximedia = { version = "0.1.8", features = ["ml", "auto-caption"] }
use oximedia::ml::pipelines::{AutoCaptionPipeline, AutoCaptionConfig};
use oximedia::ml::DeviceType;

fn main() -> oximedia::Result<()> {
    let pipeline = AutoCaptionPipeline::load(AutoCaptionConfig::default(), DeviceType::auto())?;

    // Greedy-decode a transcript directly from decoded audio samples.
    let transcript = pipeline.caption(&audio_samples)?;
    println!("{transcript}");

    Ok(())
}

What’s New in 0.1.8

Tips

This is the foundation

OxiMedia is the pure-Rust media and computer-vision layer of the COOLJAPAN ecosystem, and 0.1.8 leans on its siblings directly:

Repository: https://github.com/cool-japan/oximedia

Star the repo if you’re tired of FFmpeg/OpenCV build hell and patent worries.

Pure Rust media and computer vision is here — fast, safe, patent-free, and sovereign.

KitaSan at COOLJAPAN OÜ June 4, 2026

↑ Back to all posts