OxiMedia 0.1.5 Released — Sovereign ML Pipelines on Pure-Rust OxiONNX

Media decoding, computer vision, and now machine-learning inference — all in one pure-Rust framework, with the ML layer entirely opt-in and zero ONNX symbols in a default build.

Today we released OxiMedia 0.1.5 — a release that brings sovereign ML pipelines to OxiMedia, layering a typed, Pure-Rust inference stack atop the OxiONNX runtime so you can classify scenes and detect shot boundaries without ever linking a C++ ML runtime.

No C. No C++. No FFmpeg binaries. No OpenCV. No ONNX Runtime C++. ML inference in OxiMedia is pure-Rust, powered by OxiONNX, and it is off by default — a default oximedia build links exactly zero ONNX symbols and stays C/Fortran-free. When you do opt in, CPU inference is fully pure-Rust; GPU backends are additive feature gates. The result still compiles to a single static binary (or to WASM), with zero unsafe in the ML layer.

Why OxiMedia 0.1.5 is a game changer

Bolting machine learning onto a media pipeline has historically meant dragging in ONNX Runtime — a heavy C++ dependency with its own toolchain, its own build headaches, and its own supply-chain surface. The moment you wanted to classify a frame or detect a cut, your “pure” project stopped being pure.

OxiMedia 0.1.5 removes that compromise. The new oximedia-ml crate is a typed, Pure-Rust ML layer built on the OxiONNX runtime (oxionnx 0.1.2). It classifies scenes and detects shot boundaries with no C++ runtime anywhere in the build. Because every piece is gated behind feature flags, the default build stays lean and C/Fortran-free — you only pay for inference when you ask for it. CPU inference is fully pure-Rust via OxiONNX, and GPU backends (CUDA, WebGPU, DirectML) are purely additive.

This release also ships a credibility win that has nothing to do with new features: a codec decoder honesty pass. Several decoders (AV1, VP9, VP8, Theora, Vorbis, AVIF) previously carried “Stable”/“Complete” labels even though they parse bitstreams without yet fully reconstructing pixel or sample data end-to-end. They are now accurately relabelled Bitstream-parsing. There is no source behaviour change — the decoders parse exactly as before — only honest documentation, backed by a new per-decoder status report in docs/codec_status.md.

Technical Deep Dive

1. The oximedia-ml core. At the heart of the new crate sit a small set of typed building blocks: OnnxModel (a thin wrapper over an OxiONNX Session), ModelCache (a concurrent Arc<Mutex<_>> map with optional LRU capacity), and the TypedPipeline trait (with Input/Output associated types and a process() method). Device selection runs through DeviceType with a DeviceType::auto() runtime probe spanning Cpu, Cuda, WebGpu, DirectMl, and CoreMl. An ImagePreprocessor handles ImageNet mean/std normalization, NCHW/NHWC layouts, and letterbox/resize-to-fit, while postprocess helpers cover softmax, sigmoid, argmax, and top_k. A ModelZoo registry scaffold rounds it out.

2. The shipped pipelines. SceneClassifier is a Places365/ImageNet-style typed pipeline on OxiONNX: ImageNet-normalized 224x224 NCHW input, a configurable top_k, and softmax → top-K postprocessing, with from_model, from_path, and with_top_k constructors. ShotBoundaryDetector is TransNetV2-compatible: a 48x27 NCHW rolling window of frames feeds a many-hot output for hard and soft cuts, returning a Vec<ShotBoundary { frame_index, confidence, kind: Hard | SoftCut }> with configurable window length and threshold. Behind the all-pipelines facade you also get AestheticScorer (NIMA, 224x224 → AestheticScore), ObjectDetector (YOLOv8, 640x640 → Vec<Detection> with NMS, 80 COCO classes), and FaceEmbedder (ArcFace, 112x112 face → 512-dim FaceEmbedding).

3. Facade feature gating. The oximedia facade exposes a new oximedia::ml module re-exporting oximedia-ml behind features = ["ml"], with sub-features ml-scene-classifier, ml-shot-boundary, and ml-onnx for selective inclusion. The oximedia-ml crate itself gates on onnx, cuda, webgpu, directml, scene-classifier, shot-boundary, and all-pipelines. The default build remains symbol-free; the full feature now also picks up ml + ml-scene-classifier + ml-shot-boundary.

4. Python and decoder transparency. A new oximedia.ml PyO3 submodule (gated on oximedia-py/ml) exposes the full typed pipeline stack to Python with numpy I/O — (H,W,3) uint8 for image pipelines and (N,H,W,3) uint8 for the shot-boundary window — including MlDeviceType, OnnxModel, MlModelZoo, SceneClassifier, ShotBoundaryDetector, AestheticScorer, ObjectDetector, and FaceEmbedder. Separately, decoders now sort into a four-tier taxonomy — Verified / Functional / Bitstream-parsing / Experimental — documented in the top-level README, oximedia-codec/README.md, and docs/codec_status.md.

Getting Started

Base install:

cargo add oximedia

Enable the ML layer with the features you need:

[dependencies]
oximedia = { version = "0.1.5", features = ["ml", "ml-scene-classifier", "ml-onnx"] }

Then run a typed scene-classification pipeline on OxiONNX:

use oximedia::ml::pipelines::{SceneClassifier, SceneImage};
use oximedia::ml::{DeviceType, TypedPipeline};

let classifier = SceneClassifier::load("places365.onnx", DeviceType::auto())?;
let image = SceneImage::new(rgb_bytes, 224, 224)?;
for pred in classifier.run(image)? {
    println!("class {} -> {:.3}", pred.class_index, pred.score);
}

DeviceType::auto() probes for the best available backend at runtime, so the same code path runs on CPU (fully pure-Rust) or on a GPU backend you opted into.

What’s New in 0.1.5

Pure-Rust ONNX inference via OxiONNX — the new oximedia-ml crate wraps oxionnx 0.1.2 (plus oxionnx-core, oxionnx-gpu, oxionnx-directml) as optional deps. Typed pipelines with zero-cost defaults: no ONNX symbols are linked unless the onnx feature is enabled.
Core types — OnnxModel, ModelCache (concurrent, optional LRU), the TypedPipeline trait, DeviceType::auto(), ImagePreprocessor (ImageNet normalization, NCHW/NHWC, letterbox), postprocess helpers (softmax/sigmoid/argmax/top_k), and a ModelZoo registry scaffold.
SceneClassifier pipeline — Places365/ImageNet-style typed pipeline, configurable top_k, ImageNet-normalized 224x224 NCHW, softmax → top-K.
ShotBoundaryDetector pipeline — TransNetV2-compatible 48x27 NCHW rolling window, many-hot hard/soft-cut output, configurable window length and threshold.
Full pipeline set behind all-pipelines — SceneClassifier, ShotBoundaryDetector, AestheticScorer (NIMA), ObjectDetector (YOLOv8, 80 COCO classes, with NMS), and FaceEmbedder (ArcFace, 512-dim).
Facade integration + feature gates — oximedia::ml behind features = ["ml"], with ml-scene-classifier, ml-shot-boundary, and ml-onnx for selective inclusion; the full feature now picks up the ML pipelines. The default build stays symbol-free.
Python oximedia.ml submodule — PyO3 bindings for the typed ML stack (gated on oximedia-py/ml), with numpy I/O and the full pipeline set including a heuristic() fallback on ShotBoundaryDetector. Default pip install oximedia stays lean.
Example — examples/ml_scene_classify.rs demonstrates end-to-end scene classification via the typed pipeline (gated by ml + ml-scene-classifier).
55+ new tests across oximedia-ml (model-cache concurrency, LRU eviction, preprocessing, pipeline contracts, synthetic tensor fixtures), plus a comprehensive ML guide in docs/ml_guide.md and a README “Sovereign ML Pipelines” section.
Codec decoder honesty pass — a four-tier taxonomy (Verified / Functional / Bitstream-parsing / Experimental); AV1, VP9, VP8, Theora, Vorbis, and AVIF are now accurately labelled Bitstream-parsing (no behaviour change), documented in docs/codec_status.md. examples/decode_video.rs was rewritten to reflect the real status matrix.
More docs and harnesses — an #[ignore]’d AV1 real-bitstream integration test reading OXIMEDIA_AV1_FIXTURE, a TODO.md “Codec Implementation Roadmap”, plus docs/rate_control.md, docs/simd_dispatch.md, and docs/wave5_deltas.md.

This release is 108 crates, roughly 2,677,000 SLoC of Rust, with 81,383 tests passing (0 failures, 245 skipped) via cargo nextest run --workspace --all-features, zero clippy warnings, Apache-2.0, MSRV Rust 1.85+. All 108 crates are Stable.

Tips

ML is opt-in. Enable features = ["ml", "ml-scene-classifier", "ml-onnx"] only when you need inference — otherwise the build stays symbol-free and C/Fortran-free.
Let the runtime pick the device. Use DeviceType::auto() to select CPU or GPU at runtime instead of hard-coding a backend; CPU is fully pure-Rust.
Pull GPU backends only when you have a GPU. The cuda, webgpu, and directml feature gates on oximedia-ml are additive — add them deliberately, since they widen your dependency surface.
From Python, opt in too. Install with the ml feature, then import oximedia.ml to reach the typed pipelines with numpy I/O.
Check decoder status before relying on it. Consult docs/codec_status.md to see which decoders are Verified versus Bitstream-parsing — the latter parse the bitstream but do not yet reconstruct pixels/samples end-to-end.
Note on the older runtime. oximedia-neural still ships its pre-existing homegrown ONNX-style runtime alongside the new OxiONNX-backed oximedia-ml pipelines; consolidation onto a single ML stack is planned for the future.

Part of the COOLJAPAN ecosystem

OxiMedia 0.1.5 stands on Pure-Rust COOLJAPAN foundations: OxiONNX (oxionnx) for sovereign ONNX inference, SciRS2 (scirs2-core) for tensor and signal math, OxiFFT (oxifft) for transforms, and OxiARC (oxiarc-archive) for compression. Every one of these is a real dependency — no C, no C++, no Fortran in the default build.

Repository: https://github.com/cool-japan/oximedia

Star the repo if a single pure-Rust framework for media, computer vision, and ML — with inference you can switch off entirely — is something you want to see thrive.

Pure Rust media, computer vision, and ML is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ April 21, 2026