COOLJAPAN
← All posts

phop 0.1.0 Released — Differentiable Symbolic Discovery That Finds (and Proves) the Equation

phop 0.1.0 — a pure-Rust differentiable symbolic-discovery engine that learns the topology and constants of closed-form laws by gradient descent over trees of a single operator, eml(x,y)=exp(x)−ln(y), then certifies and SMT/Lean-proves them. Runs the whole pipeline in the browser via WASM.

release phop rust pure-rust symbolic-regression autograd wasm gpu smt

A symbolic-regression engine that never hands you an unverifiable string — it hands you a differentiable EML tree it can canonicalize, certify, and machine-prove, in pure Rust, all the way into your browser.

Today we released phop 0.1.0 — a pure-Rust differentiable symbolic-discovery engine that learns both the topology (the shape of the expression tree) and the numeric constants of closed-form laws by gradient descent. Its name is the Thai word พบ, “to discover, to encounter,” and its tagline is the whole thesis: Discover the equation. One operator at a time.

The trick that makes gradient descent over expression structure possible is a single operator. Every internal node of every candidate tree is the same binary operation — the EML operator,

eml(x, y) = exp(x) − ln(y)

which (Odrzywołek 2026, arXiv:2603.21852) is functionally complete for the elementary functions once you add the constant 1. Because every node is identical, an entire population of candidate expressions evaluates inside one automatic-differentiation graph, and the only categorical choice left — which source feeds each leaf — is relaxed with Gumbel-Softmax and descended end-to-end. Conventional symbolic regression, with its heterogeneous operator sets and discrete tree search, simply cannot ride a gradient into the topology. phop can.

This is the first public release — Apache-2.0, a six-crate Cargo workspace: phop-core (the engine), phop-cli (phop discover), phop-py (PyO3), phop-wasm (WebAssembly), phop-bench, and phop-examples — over 10,000 lines of pure Rust (~11,800 with docs). It is built on the SciRS2 ecosystem (scirs2-autograd, scirs2-core, scirs2-optimize, scirs2-symbolic) and the oxieml EML library. No C. No FFI. cargo nextest --workspace --all-features green, zero clippy/rustdoc warnings.

Why phop 0.1.0 is a game changer

Symbolic regression promises the dream of automated science: hand it data, get back the equation. But the established tools — PySR, SymbolicRegression.jl, AI-Feynman, the late Eureqa — all reach that dream through the same narrow door, and it is a door that closes off everything you actually want to do next.

The incumbent pain:

Then: phop 0.1.0 ends all of that.

Technical Deep Dive: one operator, one autograd graph

Under the hood, phop 0.1.0 is four cooperating layers.

Layer A — the tensorized forest. An EML tree evaluates as a single batched autograd graph with guarded exp/ln (clamped arguments, NaN-safe) that match oxieml numerically while staying finite on the worst inputs — exp overflow, ln of a non-positive number — so the gradient never sees an infinity.

Layer B — differentiable topology and depth. Over a complete-tree skeleton, each leaf holds Gumbel-Softmax logits across its possible sources — the input variables plus a learnable constant. The temperature anneals from exploratory to near-discrete, then argmax-hardens into a concrete tree. Depth is learnable too: every non-root node carries an expand/terminate gate σ(z_n). Because eml has no identity or skip operation, a node does not zero out DARTS-style — it terminates into a leaf, collapsing the subtree below it.

Layer C — the Pareto front. Candidates are scored on (accuracy, complexity), and the non-dominated front is returned — so you see the trade-off rather than a single forced choice. Solution::predict(x) evaluates any recovered law on fresh data.

Layer D — distillation and verification. Constants are sharpened by Levenberg–Marquardt and snapped to named constants (π, e, √2, …) whenever that doesn’t worsen the fit. The tree then renders to canonical LaTeX — via oxieml lowering and an optional egraph equality-saturation pass through scirs2-symbolic (e.g. ln(exp(x)) → x) — as well as to Rust, NumPy, and SymPy. Finally the verification ladder runs: interval certificates → SMT → Lean.

Beyond the four layers, phop ships a meta-search: discover_auto (--method auto) runs every EML-tree search plus the rich-leaf affine engine and oxieml’s own GA/beam/MCTS, then merges their Pareto fronts. The rich-leaf affine engine (discover_affine, with Linear and LogLinear leaves) recovers product, power, and ratio laws directly; DataSet::to_dimensionless performs Buckingham-π dimensional reduction before the search; and discover_ode recovers the right-hand side of an ODE.

Getting Started

phop 0.1.0 installs on four surfaces — the Rust engine, the CLI, Python, and the browser.

The Rust library (the engine):

cargo add phop-core
use phop_core::{Config, DataSet, Discoverer};
use scirs2_core::ndarray::{Array1, Array2};

// y = exp(x)
let xs: Vec<f64> = (0..50).map(|i| i as f64 * 0.08).collect();
let x = Array2::from_shape_vec((xs.len(), 1), xs.clone()).unwrap();
let y = Array1::from(xs.iter().map(|v| v.exp()).collect::<Vec<_>>());
let ds = DataSet::from_arrays(x, y).unwrap();

let front = Discoverer::new(Config::default().max_depth(2)).fit(&ds).unwrap();
let best = front.best().unwrap();
println!("{}  (mse = {:.2e})", best.latex(), best.mse);
// => e^{x_{0}}  (mse = 0.00e0)

The CLI:

cargo install phop-cli
phop discover data.csv --top-k 5 --format latex
# methods: enumerate | gumbel | gated | gated-warm | auto | rich
# formats: table | latex | rust | json

Python (PyPI):

pip install phop
import numpy as np, phop
x = np.linspace(0, 4, 50).reshape(-1, 1)
y = np.exp(x[:, 0])
res = phop.Discoverer(max_depth=2).fit(x, y)
print(res.top_latex(5))

The browser, via npm:

npm install @cooljapan/phop-wasm

The WASM build runs the whole discover → canonicalize → certify → SMT-prove pipeline client-side (discover_and_verify), in ~0.9 MB gzipped, with no backend.

What’s New in 0.1.0

Discovery — a tensorized guarded EML forest; an enumerate → Adam → LM → named-constant snapping → Pareto → code-distillation pipeline that is deterministic per seed; Gumbel-Softmax differentiable leaf topology; gated differentiable depth with warm-start (recovers exp(exp(x))); structural penalties for complexity, sparsity, and parsimony; a rich-leaf affine engine (Linear/LogLinear, AI-Feynman-style rational-exponent and named-constant snapping); and the merged discover_auto_all meta-search.

GPU — CUDA (gpu-cuda, oxicuda 0.3) and Apple Metal (gpu-metal, oxicuda-metal 0.3) at CUDA parity, plus a portable WGSL gpu-wgpu forward; gpu_backend() selects CUDA → wgpu → CPU.

Verification — certified roots and sound range enclosures; OxiZ-backed SMT proofs (smt); an OxiLean proof-carrying tier (lean).

Analysis — Buckingham-π dimensional analysis; ODE right-hand-side discovery (discover_ode); NSGA-III multi-objective ranking; e-graph canonicalization (egraph); a TensorLogic neuro-symbolic bridge (tensorlogic); robust losses (MSE / Huber / Trimmed); and distillation to LaTeX, Rust, NumPy, and SymPy.

Interfaces — a CLI (discover + predict); Python (PyO3, GIL released, .pyi stubs); WASM (discover_json / discover_and_verify, npm @cooljapan/phop-wasm); a criterion bench harness; worked examples (Kepler, Planck, Michaelis–Menten, Black–Scholes, exponential growth); and mdBook docs.

Fixed — resolved an oxieml 0.1.2 EML→LRA SMT-bridge unsoundness (a spurious Unsat) upstream in oxieml 0.1.3; Unsat is now reported only for genuinely infeasible constraints.

Tips

  1. Read the whole Pareto front, not just front.best(). Discovery is Pareto-first: a simpler, slightly-worse law is often the true one. Inspect the (accuracy, complexity) front before you commit to a single equation.
  2. Reach for --method gated-warm (or auto) when structure nests. Cold differentiable search can miss deep nesting; warm-starting the gated search from the enumerate solution recovers laws like exp(exp(x)).
  3. Turn the proofs on. Build with the smt feature and call Solution::prove_lower_bound / prove_upper_bound / prove_positive / prove_no_root / prove_equivalent for an OxiZ-backed Verdict; use certified_root / certified_range for interval certificates. Your laws then ship with machine-checkable guarantees.
  4. Let it snap to named constants. LM polish plus named-constant snapping rewrites a fitted 3.14159… to π (and e, √2, …) whenever that doesn’t worsen the fit — so the output is a clean symbolic law, not a pile of decimals.
  5. Reduce dimensions first. When your inputs carry units, run DataSet::to_dimensionless to form Buckingham-π groups and shrink the search space before discovery.
  6. Pick your accelerator. Use gpu-cuda / gpu-metal for a GPU-resident fit (~9.4 ms/epoch at 1M rows on an RTX 3060) or the portable gpu-wgpu; gpu_backend() auto-selects CUDA → wgpu → CPU, so the same code runs anywhere.

This is the foundation

phop is not a foundation others build on — it is a consumer of one, a showcase that wires the COOLJAPAN verification and autograd layers into a single discover-and-prove pipeline:

Every layer is pure Rust — no C, no Fortran, no FFI — so a law discovered by phop is differentiable, certifiable, and provable on one sovereign stack, from the autograd tape to the Lean kernel.

Repository: https://github.com/cool-japan/phop

Star the repo if equation discovery that returns a proof instead of a string — pure Rust, no C, no FFI, and runnable in the browser — is what you’ve been waiting for.

The era of symbolic regression that hands you an unverifiable string is over. Pure-Rust equation discovery — differentiable, provable, and sovereign — is here.

KitaSan at COOLJAPAN OÜ June 25, 2026

↑ Back to all posts