The Gaussian avatar core just grew an entire studio around it.
Today we released OxiGAF 0.1.1 — a sweeping feature release that turns last winter’s monocular-video Gaussian-avatar reconstructor into a full pure-Rust avatar studio, with rigging, expressions, a production render pipeline, an extended diffusion sampler suite, and a dramatically larger CLI.
No C. No C++. No Fortran. No PyTorch, no CUDA-only kernels you can’t read, no Python you have to ship a virtualenv for. OxiGAF reconstructs an animatable 3D Gaussian avatar from a single casual video and compiles to a single static binary — the GPU paths run through wgpu on Metal, Vulkan, and DirectX alike, and the whole thing stays inside the COOLJAPAN ecosystem (oxiarc-archive instead of zip, ToRSh tensors via the bridge, no openblas).
Why OxiGAF 0.1.1 is a game changer
The 0.1.0 release proved the hard part: a fully differentiable Gaussian-Splatting rasterizer, multi-view diffusion, and FLAME binding that trains end-to-end. But a research core is not a studio. If you wanted to actually use an avatar — rig it, drive it with speech, light it, denoise it, export it to glTF — you were on your own.
0.1.1 closes that gap on every front:
- It rigs and animates. A new avatar rig, gaze controller, and head tracker sit on top of a full expression system: FACS action-unit coefficients, expression transfer and clustering, emotion recognition, and phoneme-driven animation. The face is no longer a static reconstruction — it performs.
- It renders like a real engine. A complete post-processing stack lands: SSAO, bloom, depth-of-field, motion blur, HDR tone mapping, color grading, TAA, film grain, chromatic aberration, and lens distortion — plus volumetric rendering, a render graph, stereo (side-by-side / top-bottom) output, and BVH/LOD spatial acceleration.
- It samples better. The diffusion side gains a real sampler suite — DDPM, adaptive sampling, consistency models, flow matching, guidance rescaling — alongside LoRA and ControlNet adapters, KV-cache and fused attention, and SDEdit-style in-context image editing.
- It trains seriously. Curriculum learning, progressive training, few-shot adaptation, MAML meta-learning, and continual learning join gradient surgery, OHEM, EMA, SWA, and mixed precision.
- It ships a toolbox. The CLI explodes from six subcommands into a full kit: PLY/glTF/mesh/point-cloud/video/animation export, a scene analyser and model inspector, scene merging and streaming, a live dashboard, and HTML experiment reports.
All of it under the same discipline as 0.1.0: 12,537 tests passing, zero unwrap(), every file under 2,000 lines, and platform-specific GPU dependencies feature-gated so the default build stays 100% Pure Rust.
Technical Deep Dive: the four layers that grew
OxiGAF’s workspace is a clean stack of crates, and 0.1.1 deepened each one.
-
oxigaf-flame— the head model became a face engine. Beyond the original Linear Blend Skinning and safetensors I/O, it now carries a mesh-processing suite (repair, smoothing, Loop and Catmull-Clark subdivision, morphing), geometry tools (geodesic distance, spectral analysis, statistical shape models, symmetry detection), a UV & texture pipeline (parameterisation, texture baking, face atlas, albedo, SH lighting), and the rig/expression system that drives all of it. -
oxigaf-render— from rasterizer to render pipeline. The differentiable 3D Gaussian Splatting rasterizer (still with its full backward pass) is now wrapped in a render graph: post-processing, volumetric ray-marching, scene/image compositing, silhouette extraction, background synthesis, MIP splatting, interactive Gaussian picking, and GPU tooling (profiler, debug readback, render metrics). -
oxigaf-diffusion— controllable multi-view generation. The multi-view U-Net with cross-view attention gains the sampler suite, identity/avatar conditioning, classifier guidance, ControlNet and LoRA adapters, latent-space tooling (blending, interpolation, latent walks, DDIM inversion), and CLIP scoring for evaluation. -
oxigaf-trainer+oxigaf-cli— the operational layer. The trainer adds advanced loss functions (adaptive, contrastive, multi-resolution), the learning regimes above, and full session management (checkpoint manager, profiler integration, callbacks). The CLI turns that into a usable workflow with export, analysis, visualisation, and reporting. A new unifiedpipelinemodule in theoxigafmeta-crate orchestrates training → rendering → export end to end.
Under the hood, 0.1.1 also moved the whole stack forward: nalgebra 0.34→0.35, glam 0.32→0.33, the candle family 0.9→0.10, safetensors 0.7→0.8, wgpu 28→29, and the ToRSh tensor bridge (torsh-core/torsh-tensor/torsh-nn) 0.1.0→0.1.2, with kiddo 5 added for spatial queries.
Getting Started
cargo add oxigaf
A minimal multi-view generation pass with the new sampler defaults:
use oxigaf_diffusion::{MultiViewDiffusionPipeline, DiffusionConfig};
use candle_core::Device;
use std::path::Path;
fn main() -> anyhow::Result<()> {
// Configure multi-view generation with classifier-free guidance
let config = DiffusionConfig {
num_views: 4,
guidance_scale: 7.5,
num_inference_steps: 50,
..Default::default()
};
// Load the complete pipeline (Metal / Vulkan / CUDA via wgpu + candle)
let device = Device::cuda_if_available(0)?;
let pipeline = MultiViewDiffusionPipeline::load(config, Path::new("weights/"), &device)?;
// Generate consistent views from one frame, conditioned on camera poses
let output = pipeline.generate(&input_image, &camera_poses)?;
Ok(())
}
Process a captured video as a FLAME sequence with automatic LRU caching:
use oxigaf_flame::FlameSequence;
use std::path::Path;
let mut sequence = FlameSequence::from_json(Path::new("sequence.json"))?;
let frame_42 = sequence.get_frame(42)?; // cached on access
let interpolated = sequence.interpolate(42.5)?; // sub-frame interpolation
Or drive the whole thing from the CLI:
# Explore the expanded toolset
cargo run -p oxigaf-cli -- --help
# Train, then export the result to glTF
oxigaf train --config experiment.toml
oxigaf export --format gltf --input checkpoint.safetensors --output avatar.gltf
What’s New in 0.1.1
- Rigging & expressions —
AvatarRig,GazeController,HeadTracker; FACS AU coefficients, expression transfer/clustering, emotion recognition, and phoneme-driven animation. - Mesh & geometry suite — repair, smoothing, Loop/Catmull-Clark subdivision, morphing, geodesic distance, spectral analysis, statistical shape models, symmetry detection.
- UV & texture pipeline — UV parameterisation, texture baking, face atlas, albedo maps, SH lighting.
- Render pipeline — SSAO, bloom, DoF, motion blur, HDR tone mapping, color grading, TAA, film grain, chromatic aberration, lens distortion; volumetric rendering, render graph, stereo output, BVH/LOD acceleration.
- Diffusion suite — DDPM/adaptive/consistency/flow-matching samplers, guidance rescaling, LoRA & ControlNet adapters, KV cache, fused attention, SDEdit-style editing, DDIM inversion, CLIP scoring.
- Training regimes — curriculum learning, progressive training, few-shot adaptation, MAML meta-learning, continual learning; gradient surgery, OHEM, EMA, SWA, mixed precision.
- CLI toolset — PLY/glTF/mesh/point-cloud/video/animation export; scene analyser, model inspector, diff tool, quality checker; scene merging/streaming, live dashboard, HTML experiment reports.
- Unified pipeline — new
pipelinemodule plusend_to_end_pipeline,checkpoint_lifecycle, andcustom_lossexamples. - Dependency refresh — nalgebra 0.35, glam 0.33, candle 0.10, safetensors 0.8, wgpu 29, ToRSh bridge 0.1.2, oxiarc-archive 0.3.3, plus kiddo 5.
Tips
cargo add oxigafgives you everything. The meta-crate re-exports all sub-crates and ships aprelude; reach foroxigaf_diffusion,oxigaf_render, etc. directly only when you want a thinner dependency.- Use the new
pipelinemodule instead of wiring stages by hand. It orchestrates training → rendering → export; theend_to_end_pipelineexample is the fastest way to see the full flow. - Keep
--all-featuresoff on macOS. It tries to enablecuda(Linux/Windows only). On Apple Silicon build with--features "metal,simd,parallel,flash_attention"; on a CUDA box swapmetalforcuda. - Drive expressions with
phoneme-based animation for lip-sync from audio, or feed FACS AU coefficients directly when you already have action-unit tracks. - Reach for the new samplers for speed. Consistency models and flow matching cut inference steps dramatically versus plain DDPM when you need previews.
- Adapt to a new identity without full retraining via the few-shot / LoRA adapter paths, then fine-tune with curriculum learning for quality.
This is the foundation
OxiGAF 0.1.1 slots into a COOLJAPAN ecosystem that has grown up around it. It pairs naturally with oxihuman (parametric body generation) for a full body-plus-photoreal-face digital human, pipes reconstructed avatars through oximedia for real-time video, leans on ToRSh for tensor interop via oxigaf-bridge, and stays Pure Rust on the same foundations as SciRS2, OxiBLAS, and OxiFFT.
Repository: https://github.com/cool-japan/oxigaf
Star the repo if you want digital humans that are fast, safe, and truly sovereign.
Pure Rust Gaussian avatars are here — and now they come with a studio.
— KitaSan at COOLJAPAN OÜ June 19, 2026