Media decoding, computer vision, and now machine-learning inference — all in one pure-Rust framework, with the ML layer entirely opt-in and zero ONNX symbols in a default build.
Today we released OxiMedia 0.1.5 — a release that brings sovereign ML pipelines to OxiMedia, layering a typed, Pure-Rust inference stack atop the OxiONNX runtime so you can classify scenes and detect shot boundaries without ever linking a C++ ML runtime.
No C. No C++. No FFmpeg binaries. No OpenCV. No ONNX Runtime C++. ML inference in OxiMedia is pure-Rust, powered by OxiONNX, and it is off by default — a default oximedia build links exactly zero ONNX symbols and stays C/Fortran-free. When you do opt in, CPU inference is fully pure-Rust; GPU backends are additive feature gates. The result still compiles to a single static binary (or to WASM), with zero unsafe in the ML layer.
Why OxiMedia 0.1.5 is a game changer
Bolting machine learning onto a media pipeline has historically meant dragging in ONNX Runtime — a heavy C++ dependency with its own toolchain, its own build headaches, and its own supply-chain surface. The moment you wanted to classify a frame or detect a cut, your “pure” project stopped being pure.
OxiMedia 0.1.5 removes that compromise. The new oximedia-ml crate is a typed, Pure-Rust ML layer built on the OxiONNX runtime (oxionnx 0.1.2). It classifies scenes and detects shot boundaries with no C++ runtime anywhere in the build. Because every piece is gated behind feature flags, the default build stays lean and C/Fortran-free — you only pay for inference when you ask for it. CPU inference is fully pure-Rust via OxiONNX, and GPU backends (CUDA, WebGPU, DirectML) are purely additive.
This release also ships a credibility win that has nothing to do with new features: a codec decoder honesty pass. Several decoders (AV1, VP9, VP8, Theora, Vorbis, AVIF) previously carried “Stable”/“Complete” labels even though they parse bitstreams without yet fully reconstructing pixel or sample data end-to-end. They are now accurately relabelled Bitstream-parsing. There is no source behaviour change — the decoders parse exactly as before — only honest documentation, backed by a new per-decoder status report in docs/codec_status.md.
Technical Deep Dive
1. The oximedia-ml core. At the heart of the new crate sit a small set of typed building blocks: OnnxModel (a thin wrapper over an OxiONNX Session), ModelCache (a concurrent Arc<Mutex<_>> map with optional LRU capacity), and the TypedPipeline trait (with Input/Output associated types and a process() method). Device selection runs through DeviceType with a DeviceType::auto() runtime probe spanning Cpu, Cuda, WebGpu, DirectMl, and CoreMl. An ImagePreprocessor handles ImageNet mean/std normalization, NCHW/NHWC layouts, and letterbox/resize-to-fit, while postprocess helpers cover softmax, sigmoid, argmax, and top_k. A ModelZoo registry scaffold rounds it out.
2. The shipped pipelines. SceneClassifier is a Places365/ImageNet-style typed pipeline on OxiONNX: ImageNet-normalized 224x224 NCHW input, a configurable top_k, and softmax → top-K postprocessing, with from_model, from_path, and with_top_k constructors. ShotBoundaryDetector is TransNetV2-compatible: a 48x27 NCHW rolling window of frames feeds a many-hot output for hard and soft cuts, returning a Vec<ShotBoundary { frame_index, confidence, kind: Hard | SoftCut }> with configurable window length and threshold. Behind the all-pipelines facade you also get AestheticScorer (NIMA, 224x224 → AestheticScore), ObjectDetector (YOLOv8, 640x640 → Vec<Detection> with NMS, 80 COCO classes), and FaceEmbedder (ArcFace, 112x112 face → 512-dim FaceEmbedding).
3. Facade feature gating. The oximedia facade exposes a new oximedia::ml module re-exporting oximedia-ml behind features = ["ml"], with sub-features ml-scene-classifier, ml-shot-boundary, and ml-onnx for selective inclusion. The oximedia-ml crate itself gates on onnx, cuda, webgpu, directml, scene-classifier, shot-boundary, and all-pipelines. The default build remains symbol-free; the full feature now also picks up ml + ml-scene-classifier + ml-shot-boundary.
4. Python and decoder transparency. A new oximedia.ml PyO3 submodule (gated on oximedia-py/ml) exposes the full typed pipeline stack to Python with numpy I/O — (H,W,3) uint8 for image pipelines and (N,H,W,3) uint8 for the shot-boundary window — including MlDeviceType, OnnxModel, MlModelZoo, SceneClassifier, ShotBoundaryDetector, AestheticScorer, ObjectDetector, and FaceEmbedder. Separately, decoders now sort into a four-tier taxonomy — Verified / Functional / Bitstream-parsing / Experimental — documented in the top-level README, oximedia-codec/README.md, and docs/codec_status.md.
Getting Started
Base install:
cargo add oximedia
Enable the ML layer with the features you need:
[dependencies]
oximedia = { version = "0.1.5", features = ["ml", "ml-scene-classifier", "ml-onnx"] }
Then run a typed scene-classification pipeline on OxiONNX:
use oximedia::ml::pipelines::{SceneClassifier, SceneImage};
use oximedia::ml::{DeviceType, TypedPipeline};
let classifier = SceneClassifier::load("places365.onnx", DeviceType::auto())?;
let image = SceneImage::new(rgb_bytes, 224, 224)?;
for pred in classifier.run(image)? {
println!("class {} -> {:.3}", pred.class_index, pred.score);
}
DeviceType::auto() probes for the best available backend at runtime, so the same code path runs on CPU (fully pure-Rust) or on a GPU backend you opted into.
What’s New in 0.1.5
- Pure-Rust ONNX inference via OxiONNX — the new
oximedia-mlcrate wrapsoxionnx 0.1.2(plusoxionnx-core,oxionnx-gpu,oxionnx-directml) as optional deps. Typed pipelines with zero-cost defaults: no ONNX symbols are linked unless theonnxfeature is enabled. - Core types —
OnnxModel,ModelCache(concurrent, optional LRU), theTypedPipelinetrait,DeviceType::auto(),ImagePreprocessor(ImageNet normalization, NCHW/NHWC, letterbox), postprocess helpers (softmax/sigmoid/argmax/top_k), and aModelZooregistry scaffold. - SceneClassifier pipeline — Places365/ImageNet-style typed pipeline, configurable
top_k, ImageNet-normalized 224x224 NCHW, softmax → top-K. - ShotBoundaryDetector pipeline — TransNetV2-compatible 48x27 NCHW rolling window, many-hot hard/soft-cut output, configurable window length and threshold.
- Full pipeline set behind
all-pipelines—SceneClassifier,ShotBoundaryDetector,AestheticScorer(NIMA),ObjectDetector(YOLOv8, 80 COCO classes, with NMS), andFaceEmbedder(ArcFace, 512-dim). - Facade integration + feature gates —
oximedia::mlbehindfeatures = ["ml"], withml-scene-classifier,ml-shot-boundary, andml-onnxfor selective inclusion; thefullfeature now picks up the ML pipelines. The default build stays symbol-free. - Python
oximedia.mlsubmodule — PyO3 bindings for the typed ML stack (gated onoximedia-py/ml), with numpy I/O and the full pipeline set including aheuristic()fallback onShotBoundaryDetector. Defaultpip install oximediastays lean. - Example —
examples/ml_scene_classify.rsdemonstrates end-to-end scene classification via the typed pipeline (gated byml + ml-scene-classifier). - 55+ new tests across
oximedia-ml(model-cache concurrency, LRU eviction, preprocessing, pipeline contracts, synthetic tensor fixtures), plus a comprehensive ML guide indocs/ml_guide.mdand a README “Sovereign ML Pipelines” section. - Codec decoder honesty pass — a four-tier taxonomy (Verified / Functional / Bitstream-parsing / Experimental); AV1, VP9, VP8, Theora, Vorbis, and AVIF are now accurately labelled Bitstream-parsing (no behaviour change), documented in
docs/codec_status.md.examples/decode_video.rswas rewritten to reflect the real status matrix. - More docs and harnesses — an
#[ignore]’d AV1 real-bitstream integration test readingOXIMEDIA_AV1_FIXTURE, aTODO.md“Codec Implementation Roadmap”, plusdocs/rate_control.md,docs/simd_dispatch.md, anddocs/wave5_deltas.md.
This release is 108 crates, roughly 2,677,000 SLoC of Rust, with 81,383 tests passing (0 failures, 245 skipped) via cargo nextest run --workspace --all-features, zero clippy warnings, Apache-2.0, MSRV Rust 1.85+. All 108 crates are Stable.
Tips
- ML is opt-in. Enable
features = ["ml", "ml-scene-classifier", "ml-onnx"]only when you need inference — otherwise the build stays symbol-free and C/Fortran-free. - Let the runtime pick the device. Use
DeviceType::auto()to select CPU or GPU at runtime instead of hard-coding a backend; CPU is fully pure-Rust. - Pull GPU backends only when you have a GPU. The
cuda,webgpu, anddirectmlfeature gates onoximedia-mlare additive — add them deliberately, since they widen your dependency surface. - From Python, opt in too. Install with the ml feature, then
import oximedia.mlto reach the typed pipelines with numpy I/O. - Check decoder status before relying on it. Consult
docs/codec_status.mdto see which decoders are Verified versus Bitstream-parsing — the latter parse the bitstream but do not yet reconstruct pixels/samples end-to-end. - Note on the older runtime.
oximedia-neuralstill ships its pre-existing homegrown ONNX-style runtime alongside the new OxiONNX-backedoximedia-mlpipelines; consolidation onto a single ML stack is planned for the future.
Part of the COOLJAPAN ecosystem
OxiMedia 0.1.5 stands on Pure-Rust COOLJAPAN foundations: OxiONNX (oxionnx) for sovereign ONNX inference, SciRS2 (scirs2-core) for tensor and signal math, OxiFFT (oxifft) for transforms, and OxiARC (oxiarc-archive) for compression. Every one of these is a real dependency — no C, no C++, no Fortran in the default build.
Repository: https://github.com/cool-japan/oximedia
Star the repo if a single pure-Rust framework for media, computer vision, and ML — with inference you can switch off entirely — is something you want to see thrive.
Pure Rust media, computer vision, and ML is here — fast, safe, and sovereign.
— KitaSan at COOLJAPAN OÜ April 21, 2026