The pure-Rust FFmpeg + OpenCV replacement just learned to caption video, tone-map like a film, and ship CMAF segments with zero copies.
Today we released OxiMedia 0.1.8 — the patent-free, memory-safe reconstruction of FFmpeg (multimedia processing) and OpenCV (computer vision), unified in a single pure-Rust framework.
No C. No C++. No FFmpeg binaries. No OpenCV Python bindings. No system libraries, no patent royalties. OxiMedia compiles to a single static binary (or to wasm32-unknown-unknown for the browser) and runs everywhere with one cargo add.
Why OxiMedia 0.1.8 is a game changer
FFmpeg and OpenCV gave the world media and vision tooling — at the cost of C/C++ memory unsafety, patent-encumbered codecs, and build systems that demand a dozen system libraries. 0.1.8 keeps OxiMedia’s pure-Rust promise while adding capabilities that normally pull in heavyweight native runtimes:
- Speech-to-captions, in pure Rust. The new
AutoCaptionPipelineinoximedia-mlis a Whisper-compatible encoder + decoder ONNX inference pipeline with greedy decode —encode_audio,step_decode, and a one-callcaptionentry point — all running on the pure-Rust OxiONNX runtime, gated behind theauto-captionfeature. - Filmic tone mapping.
oximedia-colormgmtgains aToneCurveenum withReinhardSimple,ReinhardExtended { l_white },FilmicHable(Hable/Uncharted2), andAcesFitted(Narkowicz rational) operators — the curves real colorists reach for. - Zero-copy CMAF.
oximedia-streammigratesCmafChunk.datafromVec<u8>tobytes::Bytes, andwrite_cmaf_segmentnow returnsVec<Bytes>for scatter-gather segment output with no buffer copies. - Real file repair.
oximedia-repairships an mmap-backeddeep_scan(memmap2, with a streaming fallback below 4 MiB), an mtime-awaredetection_cache, and a fullfix_issuedispatcher wired toconceal,partial,container_migrate, andcodec_probe. - Duplicate resolution that touches the filesystem.
oximedia-dedupaddsMergeExecutor,AppliedAction, andMergeReport— real symlink, hardlink, delete, and dry-run modes with safety precondition checks.
At 0.1.8 the workspace is 109 crates and ~2.75M lines of pure Rust, with 100,278 tests passing (0 failures, 0 warnings, cargo nextest run --workspace --all-features).
Technical Deep Dive: how the new layers fit
-
ML & Inference (
oximedia-ml,oximedia-neural) — theAutoCaptionPipelinerides the typed-pipeline layer atop OxiONNX.oximedia-neuraladds anonnx-gatedOnnxBackend(load,runover aHashMap<String, Tensor>API). The companionoxionnxcrate (bumped 0.1.2 → 0.1.3) addsSessionBuilder::with_provider_kinds()for typed runtime execution-provider selection and aProviderKind::DirectMlvariant, so the EP dispatch chain consults the provider priority list at runtime. -
Audio (
oximedia-audio) — acompute_log_mel_spectrogramlands in thespectrummodule: STFT → Hann window → MelScale filterbank → log, the standard front end for speech and music models (and the AutoCaption encoder). -
Color & HDR (
oximedia-colormgmt,oximedia-hdr) — the newToneCurveoperators sit alongside a process-wideGamutConversionMatrixcache inoximedia-hdr(OnceLock<RwLock<HashMap<(ColorGamut, ColorGamut), [[f32;3];3]>>>) that eliminates redundant Bradford chromatic-adaptation and matrix-inverse work per gamut pair. -
Streaming & Repair (
oximedia-stream,oximedia-repair) — zero-copy CMAF viabytes::Bytes, six newSpliceInfoSectionencode→parse→re-encode roundtrip tests, and a real, dispatcher-driven repair engine backed by memory-mapped scanning.
Workspace guarantees hold: unsafe_code = "deny", single-binary deployment, WASM + WebGPU support, and zero C/Fortran in default features. All inference is opt-in — the default oximedia build links zero ONNX symbols.
Getting Started
cargo add oximedia
Auto-captioning a clip with the Whisper-compatible pipeline (enable the auto-caption feature):
[dependencies]
oximedia = { version = "0.1.8", features = ["ml", "auto-caption"] }
use oximedia::ml::pipelines::{AutoCaptionPipeline, AutoCaptionConfig};
use oximedia::ml::DeviceType;
fn main() -> oximedia::Result<()> {
let pipeline = AutoCaptionPipeline::load(AutoCaptionConfig::default(), DeviceType::auto())?;
// Greedy-decode a transcript directly from decoded audio samples.
let transcript = pipeline.caption(&audio_samples)?;
println!("{transcript}");
Ok(())
}
What’s New in 0.1.8
oximedia-ml—AutoCaptionPipeline(Whisper-compatible encoder + decoder ONNX inference, greedy decode) withAutoCaptionConfig,encode_audio,step_decode, andcaption, behind theauto-captionfeature.oximedia-neural—onnxfeature gate and a newOnnxBackend(load,runwith aHashMap<String, Tensor>API) backed by OxiONNX.oximedia-audio—compute_log_mel_spectrogram(STFT → Hann → MelScale → log) added to thespectrummodule.oximedia-colormgmt—ToneCurveenum:ReinhardSimple,ReinhardExtended { l_white },FilmicHable(Hable/Uncharted2), andAcesFitted(Narkowicz rational).oximedia-stream—CmafChunk.datamoved tobytes::Bytes;write_cmaf_segmentreturnsVec<Bytes>for zero-copy segments; six newSpliceInfoSectionroundtrip tests.oximedia-hdr— process-wideGamutConversionMatrixcache eliminating redundant Bradford CAT + matrix-inverse work per gamut pair.oximedia-repair— mmap-backeddeep_scan(≥4 MiB threshold, streaming fallback), mtime-awaredetection_cache, and a fullfix_issuedispatcher (conceal,partial,container_migrate,codec_probe); the orphanedrepair_engine.rsstub was deleted.oximedia-dedup—MergeExecutor,AppliedAction, andMergeReportfor real filesystem duplicate resolution (symlink, hardlink, delete, dry-run) with safety checks.oxionnx0.1.2 → 0.1.3 —SessionBuilder::with_provider_kinds()typed EP selection,ProviderKind::DirectMl(behinddirectml), runtime provider-priority dispatch; CoreML example clippy warnings resolved.
Tips
- Caption offline, on CPU. The
AutoCaptionPipelineruns entirely on the pure-Rust OxiONNX runtime — no cloud API, no C++ort.DeviceType::auto()probes CUDA → DirectML → WebGPU → CPU once and memoises; the CPU path is always available. - Pick the right tone curve. For HDR-to-SDR rendering, reach for
AcesFittedorFilmicHablerather than a naive clamp — and useReinhardExtended { l_white }when you want to control where highlights roll off. - Cache gamut conversions for free. The new
GamutConversionMatrixcache makes repeated conversions between the same(ColorGamut, ColorGamut)pair effectively free after the first call — no API change required, just reuse the conversion path. - Reuse
Bytes, don’t copy. Withwrite_cmaf_segmentnow returningVec<Bytes>, you can scatter-gather segments straight to a socket or storage backend without an intermediate copy — keep theByteshandles instead of collecting into a freshVec<u8>. - Repair big files efficiently.
deep_scanmemory-maps anything ≥4 MiB and falls back to streaming below that, and thedetection_cacheshort-circuits unchanged files by mtime — so re-scanning a large library is cheap. - Select execution providers explicitly. When you need deterministic backend choice, use
SessionBuilder::with_provider_kinds()(oxionnx 0.1.3) to pin the EP priority list instead of relying solely on auto-probe.
This is the foundation
OxiMedia is the pure-Rust media and computer-vision layer of the COOLJAPAN ecosystem, and 0.1.8 leans on its siblings directly:
- OxiONNX — the pure-Rust ONNX runtime (now 0.1.3) powering AutoCaption and every opt-in ML pipeline without any C++ dependency.
- OxiFFT — the spectral engine behind the log-mel front end, audio analysis, MIR, and alignment.
- SciRS2 — scientific-computing primitives (linalg, random, SIMD) for the signal and CV paths.
- OxiARC — pure-Rust compression (deflate, lz4, zstd) for containers and archival.
Repository: https://github.com/cool-japan/oximedia
Star the repo if you’re tired of FFmpeg/OpenCV build hell and patent worries.
Pure Rust media and computer vision is here — fast, safe, patent-free, and sovereign.
— KitaSan at COOLJAPAN OÜ June 4, 2026