Kizzasi 0.2.1 Released — Trainable SSMs, New Architectures, and Python Bindings

The first Kizzasi could predict. This one can learn — and ship.

Today we released Kizzasi 0.2.1 — a large jump from the 0.1.0 debut that turns the Pure-Rust Autoregressive General-Purpose Signal Predictor from an inference engine into a full train-and-deploy stack for continuous signals.

No C. No C++. No Fortran. No Python interpreter required to run a model, no llama.cpp build, no CUDA toolkit pinned to a driver. Kizzasi is Rust end to end: it compiles to a single static binary (or WASM), and now also to a no_std core for microcontrollers and a pip install-able Python wheel for the data-science crowd. One codebase, every deployment target.

Why Kizzasi 0.2.1 is a game changer

0.1.0 shipped a broad SSM inference stack — Mamba/RWKV/S4D, tokenizers, constraints, world I/O. The thing it could not do was learn. You brought weights; Kizzasi ran them. 0.2.x closes that gap and then some.

The headline of this release is full backpropagation through the SSM recurrence (backprop_ssm.rs), with gradient checkpointing for memory-efficient training, LoRA adapters for cheap fine-tuning, and curriculum learning with progressive difficulty. You can now train a Kizzasi model from scratch, or adapt a pretrained one, entirely in Rust.

On top of that, the model zoo grew well past the original four architectures:

RWKV v5 and v7 with data-dependent time decay.
Neural ODE continuous-time models.
Spiking neural networks — a neuromorphic SSM for event-driven, ultra-low-power inference.
Flash Linear Attention kernels.
Speculative decoding for faster generation.
Multi-modal fusion across audio + vision + control.

And the deployment story is now complete: Python bindings via PyO3/maturin, no_std embedded inference, WASM with a browser demo, gRPC and REST inference servers, a GGUF loader with full dequantization, a HuggingFace Hub client for model download, ONNX export, model pruning, and NAS (architecture search) for model selection.

These are not aspirational bullet points. The workspace is now roughly 123,000 lines of Rust across 351 source files, with 2,235 tests passing (up from 397 at 0.1.0), zero clippy warnings, and 100% Pure Rust in the default build.

Technical Deep Dive: what grew

The crate layout from 0.1.0 held up — and gained two new members.

kizzasi-model — now trainable, and much bigger. Beyond Mamba/Mamba2/S4D/Transformer, it adds RWKV v5/v7, Neural ODE, and spiking models, plus the full training machinery: backprop through the recurrence, gradient checkpointing, LoRA, curriculum learning, NAS, and pruning. JSON weight I/O (save/load_weights_json) and a NameRemapper for HuggingFace key translation make round-tripping weights painless, and the new GGUF loader (gguf.rs + gguf_dequant.rs) reads quantized checkpoints with full dequantization.
kizzasi-inference — now a server. Speculative decoding for throughput, multi-modal fusion, and first-class gRPC and REST inference servers with distributed prediction and load balancing, so a trained model is one serve away from a network endpoint.
kizzasi-python — a real Python package. A thin PyO3 layer exposes Config and Predictor to NumPy users, published to PyPI via maturin as kizzasi. It builds on scirs2-numpy for zero-copy array interop.
kizzasi-embedded — no_std inference. A minimal SSM inference path with no heap-of-the-standard-library assumptions, aimed at edge devices and microcontrollers where the full stack won’t fit.
kizzasi-io and kizzasi-logic carried forward the world connectors and the neuro-symbolic constraint layer — the guarantees that keep a predicted torque or velocity inside physical and safety bounds remain the heart of the system.

Under the hood, the dependency stack moved up with the ecosystem: SciRS2 to the 0.4 line (scirs2-core, scirs2-signal, scirs2-fft, scirs2-series), Oxicode 0.2 for serialization, and FFTs now run on OxiFFT 0.3 in place of the previous rustfft — keeping the math layer Pure Rust top to bottom.

Getting Started

In Rust, nothing changed for the basics — add the crate:

cargo add kizzasi

use kizzasi::prelude::*;

fn main() -> KizzasiResult<()> {
    let config = KizzasiConfig::new()
        .model_type(ModelType::Mamba2)
        .input_dim(3)
        .output_dim(3)
        .hidden_dim(256)
        .state_dim(16)
        .num_layers(4)
        .context_window(8192);

    let mut predictor = Kizzasi::new(config)?;

    let input = array![0.1, 0.2, 0.3];
    let output = predictor.step(&input)?;

    println!("Predicted: {:?}", output);
    Ok(())
}

New in 0.2.x: drive it from Python. Install the wheel and predict against NumPy arrays:

pip install kizzasi

import numpy as np
import kizzasi

# Configure a signal predictor (S4D, RWKV, or Transformer).
cfg = kizzasi.Config(
    input_dim=8,
    output_dim=8,
    hidden_dim=64,
    num_layers=2,
    model_type="s4d",
)

predictor = kizzasi.Predictor(cfg)

# Single-step prediction (O(1) per step for SSMs).
x = np.random.randn(8).astype(np.float32)
y = predictor.step(x)

# Roll the model forward 100 steps.
ys = predictor.predict_n(x, n_steps=100)  # shape: (100, 8)

predictor.reset()

What’s New in 0.2.1

New architectures: RWKV v5 and v7 (data-dependent time decay), Neural ODE continuous-time models, spiking neuromorphic SSM, Flash Linear Attention, speculative decoding, and audio+vision+control multi-modal fusion.
Training & optimization: full backpropagation through the SSM recurrence, gradient checkpointing, LoRA adapters, curriculum learning, architecture search (NAS), model pruning, and ONNX export.
Deployment & integration: Python bindings (PyO3/maturin), no_std embedded support, WASM with a browser demo, Docker and Kubernetes manifests, gRPC and REST inference servers, a HuggingFace Hub client, a GGUF loader with full dequantization, and distributed prediction with load balancing.
Signal processing: cepstral analysis and pitch detection, time-frequency analysis (Gabor, S-transform, Wigner-Ville), ML-based denoising and anomaly detection, and advanced resampling (Farrow, time-varying, arbitrary SRC).
Weights & I/O: JSON weight I/O for all model types, a NameRemapper for HuggingFace key translation, and factory injection wired across every architecture.
Docs & benchmarks: mathematical formulations for every SSM architecture, an architecture-comparison benchmark suite (5 models × 4 dims), a fine-tuning workflow example, and a performance-tuning guide.
Engineering: every source file kept under 2,000 lines, zero clippy warnings, and the version stepped to 0.2.1 for clean dependency resolution.

Tips

Fine-tune, don’t retrain. Reach for the new LoRA adapters to adapt a pretrained model to your domain at a fraction of the cost, and pair them with gradient checkpointing when memory is tight.
Bring real checkpoints. The GGUF loader (with full dequantization) plus the HuggingFace Hub client mean you can pull a quantized model from the Hub and run it in Rust without a conversion dance.
Go to the edge. Build against kizzasi-embedded for no_std targets, or compile to WASM (there’s a browser demo) when you need prediction in the page rather than a server round-trip.
Serve it. A trained model can go straight behind the built-in gRPC or REST inference server; turn on distributed prediction with load balancing when one node isn’t enough.
Match the architecture to the constraint. Spiking SSMs for event-driven, ultra-low-power sensing; Neural ODE for genuinely continuous-time dynamics; RWKV v7 for lightweight long-context streams; Mamba2 as the balanced default.
Keep predictions legal. The neuro-symbolic constraint layer is unchanged in spirit — wrap any model in a GuardrailSet, or fold constraints into training via ConstraintAwareLoss / LagrangianRelaxation, so learned behavior stays inside physical and safety bounds.

This is the foundation

Kizzasi 0.2.1 sits in a Pure-Rust ecosystem that has filled out considerably since January. Its math and signal layers ride SciRS2 0.4 with OxiFFT for transforms and Oxicode for serialization, while the Python wheel leans on scirs2-numpy. The neuro-symbolic constraints continue to build on TensorLogic. It now shares the deep-learning neighborhood with ToRSh, TensFloweRS, TrustformeRS, and SkleaRS, and slots into the broader stack alongside OxiLLaMa, OxiWhisper, OxiONNX, and VoiRS — a coherent, sovereign alternative to the PyTorch/CUDA/GGML world for signals that aren’t text.

Repository: https://github.com/cool-japan/kizzasi

Star the repo if a trainable, deployable, neuro-symbolic signal predictor in Pure Rust is something you want in your stack. Pure Rust signal prediction is here — fast, safe, and sovereign.

— KitaSan at COOLJAPAN OÜ April 27, 2026