COOLJAPAN

Posts tagged #flash-attention

2 posts

Apr 24, 2026 · 8 min

OxiLLaMa 0.1.1 Released — FlashAttention, True Continuous Batching, and 5 New Architectures in Pure Rust

OxiLLaMa is a Pure Rust LLM inference engine — the sovereign alternative to llama.cpp. Version 0.1.1 ships a tiled FlashAttention CPU kernel, true continuous batching with zero padding waste, fused dequant+GEMM (~12% Q4_K_M decode gain), 5 new architectures (DBRX, Grok-1, Mamba-2, DeepSeek-V3, and more), and GPU coverage extended to 10 quantization types.

releaseoxillamallm-inference
Mar 26, 2026 · 4 min

SciRS2 0.4.0 Released — Pure Rust SciPy Replacement Now at 2.91 Million SLoC

SciPy-compatible scientific computing and AI framework in 100% Pure Rust. 2.91M SLoC, 29 crates, 25,800+ tests. Flash Attention 2, LoRA/DoRA/GPTQ, ONNX export, GPU PDE/FFT/SpMV, Temporal GNNs, NeRF/instant-NGP, WebGPU backend, Delta Lake / Kafka I/O and more. 10–100× faster, zero system deps. The sovereign scientific computing and AI foundation for the entire COOLJAPAN ecosystem (now 21M+ SLoC total).

releasescirs2scientific-computing