2 posts
OxiLLaMa is a Pure Rust LLM inference engine — the sovereign alternative to llama.cpp. Version 0.1.1 ships a tiled FlashAttention CPU kernel, true continuous batching with zero padding waste, fused dequant+GEMM (~12% Q4_K_M decode gain), 5 new architectures (DBRX, Grok-1, Mamba-2, DeepSeek-V3, and more), and GPU coverage extended to 10 quantization types.
SciPy-compatible scientific computing and AI framework in 100% Pure Rust. 2.91M SLoC, 29 crates, 25,800+ tests. Flash Attention 2, LoRA/DoRA/GPTQ, ONNX export, GPU PDE/FFT/SpMV, Temporal GNNs, NeRF/instant-NGP, WebGPU backend, Delta Lake / Kafka I/O and more. 10–100× faster, zero system deps. The sovereign scientific computing and AI foundation for the entire COOLJAPAN ecosystem (now 21M+ SLoC total).