COOLJAPAN
← All posts

MeCrab 0.1.0 Released — A Pure Rust MeCab, Japanese Morphological Analysis Without the C Toolchain

MeCrab is a pure Rust morphological analyzer compatible with MeCab IPADIC dictionaries. The 0.1.0 debut brings memory-mapped zero-copy dictionary loading, a SIMD-accelerated Viterbi lattice, runtime word addition, and the KizaMe CLI — Japanese tokenization in a single safe binary.

release mecrab rust nlp japanese mecab morphological-analysis tokenizer

MeCab is the backbone of Japanese NLP. It is also a C++ library you have to build, link, and trust. MeCrab is a pure Rust replacement — and today it ships its first release.

Today we released MeCrab 0.1.0 — a high-performance, thread-safe morphological analyzer compatible with MeCab dictionaries (IPADIC format), written entirely in Rust.

For anyone who has wired Japanese text into a pipeline, the shape of the problem is familiar: you reach for MeCab, and you inherit a C++ build, a system library to locate, and FFI bindings that fight you on every platform. MeCrab takes the dictionaries you already have and reads them from pure Rust. The result is the same morphological analysis — segmentation into morphemes, part-of-speech tagging, readings — with none of the native-toolchain friction.

No C. No C++. No libmecab to find on the system. MeCrab is pure Rust: it parses your IPADIC dictionary directly, runs the Viterbi lattice in safe code, and compiles to a single static binary (and to WASM, and to a Python extension). Thread safety is not a runtime promise bolted on afterward — it falls out of Rust’s ownership model, so concurrent analysis is safe by construction.

Why MeCrab matters

MeCab is excellent and battle-tested. The pain was never the algorithm — it was everything around it. MeCrab keeps the algorithm and removes the friction:

This is a 0.1.0 — an early but solid first release. The core is real and tested: roughly 11,000 lines of Rust, 174 passing tests, 4 fuzz targets, and zero clippy warnings.

Technical Deep Dive: the workspace

MeCrab is organized as a focused Cargo workspace, so you take only the weight you need:

The split matters: the runtime analyzer that most users embed depends on a minimal set of crates (memmap2, byteorder, encoding_rs, yada for the double-array trie), while the corpus-building machinery lives behind a feature so it never bloats a simple tokenization dependency.

Getting Started

Install the KizaMe CLI:

cargo install kizame

Initialize a dictionary (it will locate a system IPADIC), then parse:

# Find and register a system IPADIC dictionary
kizame dict init

# Parse the classic ambiguous sentence
echo "すもももももももものうち" | kizame

# Space-separated tokens (wakati)
echo "日本語の形態素解析" | kizame -w

# JSON output
echo "東京都" | kizame -O json

Or embed the analyzer in Rust:

[dependencies]
mecrab = "0.1"
use mecrab::MeCrab;

let mecrab = MeCrab::new()?;
let result = mecrab.parse("すもももももももものうち")?;
println!("{}", result);

// Add a domain-specific word at runtime — no restart
mecrab.add_word("ChatGPT", "チャットジーピーティー", "チャットジーピーティー", 5000);

What’s inside

Tips

This is the foundation

MeCrab is the first natural-language tool in the COOLJAPAN ecosystem — a Pure Rust stack that already includes OxiBLAS and OxiCode for numerics and serialization, alongside the broader scientific-computing work. Japanese morphological analysis is a load-bearing primitive for everything downstream — search, embeddings, language modeling — and getting it into safe, dependency-light Rust is the groundwork that the rest of that work can stand on. This 0.1.0 is the start of that line.

Repository: https://github.com/cool-japan/mecrab

Star the repo if you have ever lost an afternoon to building MeCab from source. Pure Rust Japanese NLP is here — fast, safe, and free of the C toolchain.

KitaSan at COOLJAPAN OÜ January 6, 2026

↑ Back to all posts