A symbolic-regression engine that never hands you an unverifiable string — it hands you a differentiable EML tree it can canonicalize, certify, and machine-prove, in pure Rust, all the way into your browser.
Today we released phop 0.1.0 — a pure-Rust differentiable symbolic-discovery engine that learns both the topology (the shape of the expression tree) and the numeric constants of closed-form laws by gradient descent. Its name is the Thai word พบ, “to discover, to encounter,” and its tagline is the whole thesis: Discover the equation. One operator at a time.
The trick that makes gradient descent over expression structure possible is a single operator. Every internal node of every candidate tree is the same binary operation — the EML operator,
eml(x, y) = exp(x) − ln(y)
which (Odrzywołek 2026, arXiv:2603.21852) is functionally complete for the elementary functions once you add the constant 1. Because every node is identical, an entire population of candidate expressions evaluates inside one automatic-differentiation graph, and the only categorical choice left — which source feeds each leaf — is relaxed with Gumbel-Softmax and descended end-to-end. Conventional symbolic regression, with its heterogeneous operator sets and discrete tree search, simply cannot ride a gradient into the topology. phop can.
This is the first public release — Apache-2.0, a six-crate Cargo workspace: phop-core (the engine), phop-cli (phop discover), phop-py (PyO3), phop-wasm (WebAssembly), phop-bench, and phop-examples — over 10,000 lines of pure Rust (~11,800 with docs). It is built on the SciRS2 ecosystem (scirs2-autograd, scirs2-core, scirs2-optimize, scirs2-symbolic) and the oxieml EML library. No C. No FFI. cargo nextest --workspace --all-features green, zero clippy/rustdoc warnings.
Why phop 0.1.0 is a game changer
Symbolic regression promises the dream of automated science: hand it data, get back the equation. But the established tools — PySR, SymbolicRegression.jl, AI-Feynman, the late Eureqa — all reach that dream through the same narrow door, and it is a door that closes off everything you actually want to do next.
The incumbent pain:
- Discrete structure search. Evolutionary tree mutation is not differentiable — you cannot ride a gradient into the topology, so structure is found by trial, mutation, and luck.
- Heterogeneous operator sets. Mixing
+,×,sin,exp, … makes the search space discrete and the evaluation graph non-uniform — there is no single tensor to differentiate. - The output is a string. A formula printed as text can’t be directly canonicalized, bounded, or proven — you get an answer you then have to re-derive trust in by hand.
- Python/Julia runtimes. No sovereign single-binary deploy, no in-browser story — the discovery engine lives wherever its interpreter lives.
Then: phop 0.1.0 ends all of that.
- One operator, one autograd graph. A single EML node type makes the topology itself differentiable (
discover_gumbel) — and so is depth:discover_gatedgives every node a learnable expand/terminate gate, so tree shape is learned under a depth curriculum.exp(x),exp(x) − ln(y), andexp(exp(x))are recovered exactly; warm-started gated search (discover_gated_warm) recovers nested structure likeexp(exp(x))that the cold search misses. - Laws that ship with proofs. A discovered law is a live
oxiemlEML tree, not a string — so the entire cool-japan verification ladder applies directly: interval-certified roots (Newton/Krawczyk) and sound range enclosures; SMT proofs (thesmtfeature) of bounds, sign, no-root, and equivalence via OxiZ, the pure-Rust Z3 replacement, with every counterexample re-verified by exact forward evaluation; and a Lean proof-carrying tier (theleanfeature) that machine-checks the EML rewrite∀x, eml x 1 = exp xagainst a postulated theory via OxiLean, the pure-Rust Lean kernel. - Pareto-first. phop returns the non-dominated (accuracy, complexity) front, not a single answer — a simpler, slightly-worse law is often the true one — plus NSGA-III ranking over accuracy, complexity, interpretability, and elegance.
- Runs everywhere. CLI, Python (PyO3), GPU, and the entire discover→prove pipeline client-side in the browser via WebAssembly — one ~0.9 MB (gzipped) WASM module, no server, no install.
- GPU-resident. CUDA (
gpu-cuda, oxicuda 0.3) and Apple Metal (gpu-metal, oxicuda-metal 0.3) backends with a GPU-resident constant fit plus Gumbel topology and gradient-checked reverse-mode analytic gradients — ~9.4 ms/epoch at 1M rows on an RTX 3060 (forward matches CPU to < 1e-3, ~8× faster at ≥ 100k rows). A portablegpu-wgpuWGSL forward covers WebGPU/Metal/Vulkan/DX12;gpu_backend()auto-selects CUDA → wgpu → CPU.
Technical Deep Dive: one operator, one autograd graph
Under the hood, phop 0.1.0 is four cooperating layers.
Layer A — the tensorized forest. An EML tree evaluates as a single batched autograd graph with guarded exp/ln (clamped arguments, NaN-safe) that match oxieml numerically while staying finite on the worst inputs — exp overflow, ln of a non-positive number — so the gradient never sees an infinity.
Layer B — differentiable topology and depth. Over a complete-tree skeleton, each leaf holds Gumbel-Softmax logits across its possible sources — the input variables plus a learnable constant. The temperature anneals from exploratory to near-discrete, then argmax-hardens into a concrete tree. Depth is learnable too: every non-root node carries an expand/terminate gate σ(z_n). Because eml has no identity or skip operation, a node does not zero out DARTS-style — it terminates into a leaf, collapsing the subtree below it.
Layer C — the Pareto front. Candidates are scored on (accuracy, complexity), and the non-dominated front is returned — so you see the trade-off rather than a single forced choice. Solution::predict(x) evaluates any recovered law on fresh data.
Layer D — distillation and verification. Constants are sharpened by Levenberg–Marquardt and snapped to named constants (π, e, √2, …) whenever that doesn’t worsen the fit. The tree then renders to canonical LaTeX — via oxieml lowering and an optional egraph equality-saturation pass through scirs2-symbolic (e.g. ln(exp(x)) → x) — as well as to Rust, NumPy, and SymPy. Finally the verification ladder runs: interval certificates → SMT → Lean.
Beyond the four layers, phop ships a meta-search: discover_auto (--method auto) runs every EML-tree search plus the rich-leaf affine engine and oxieml’s own GA/beam/MCTS, then merges their Pareto fronts. The rich-leaf affine engine (discover_affine, with Linear and LogLinear leaves) recovers product, power, and ratio laws directly; DataSet::to_dimensionless performs Buckingham-π dimensional reduction before the search; and discover_ode recovers the right-hand side of an ODE.
Getting Started
phop 0.1.0 installs on four surfaces — the Rust engine, the CLI, Python, and the browser.
The Rust library (the engine):
cargo add phop-core
use phop_core::{Config, DataSet, Discoverer};
use scirs2_core::ndarray::{Array1, Array2};
// y = exp(x)
let xs: Vec<f64> = (0..50).map(|i| i as f64 * 0.08).collect();
let x = Array2::from_shape_vec((xs.len(), 1), xs.clone()).unwrap();
let y = Array1::from(xs.iter().map(|v| v.exp()).collect::<Vec<_>>());
let ds = DataSet::from_arrays(x, y).unwrap();
let front = Discoverer::new(Config::default().max_depth(2)).fit(&ds).unwrap();
let best = front.best().unwrap();
println!("{} (mse = {:.2e})", best.latex(), best.mse);
// => e^{x_{0}} (mse = 0.00e0)
The CLI:
cargo install phop-cli
phop discover data.csv --top-k 5 --format latex
# methods: enumerate | gumbel | gated | gated-warm | auto | rich
# formats: table | latex | rust | json
Python (PyPI):
pip install phop
import numpy as np, phop
x = np.linspace(0, 4, 50).reshape(-1, 1)
y = np.exp(x[:, 0])
res = phop.Discoverer(max_depth=2).fit(x, y)
print(res.top_latex(5))
The browser, via npm:
npm install @cooljapan/phop-wasm
The WASM build runs the whole discover → canonicalize → certify → SMT-prove pipeline client-side (discover_and_verify), in ~0.9 MB gzipped, with no backend.
What’s New in 0.1.0
Discovery — a tensorized guarded EML forest; an enumerate → Adam → LM → named-constant snapping → Pareto → code-distillation pipeline that is deterministic per seed; Gumbel-Softmax differentiable leaf topology; gated differentiable depth with warm-start (recovers exp(exp(x))); structural penalties for complexity, sparsity, and parsimony; a rich-leaf affine engine (Linear/LogLinear, AI-Feynman-style rational-exponent and named-constant snapping); and the merged discover_auto_all meta-search.
GPU — CUDA (gpu-cuda, oxicuda 0.3) and Apple Metal (gpu-metal, oxicuda-metal 0.3) at CUDA parity, plus a portable WGSL gpu-wgpu forward; gpu_backend() selects CUDA → wgpu → CPU.
Verification — certified roots and sound range enclosures; OxiZ-backed SMT proofs (smt); an OxiLean proof-carrying tier (lean).
Analysis — Buckingham-π dimensional analysis; ODE right-hand-side discovery (discover_ode); NSGA-III multi-objective ranking; e-graph canonicalization (egraph); a TensorLogic neuro-symbolic bridge (tensorlogic); robust losses (MSE / Huber / Trimmed); and distillation to LaTeX, Rust, NumPy, and SymPy.
Interfaces — a CLI (discover + predict); Python (PyO3, GIL released, .pyi stubs); WASM (discover_json / discover_and_verify, npm @cooljapan/phop-wasm); a criterion bench harness; worked examples (Kepler, Planck, Michaelis–Menten, Black–Scholes, exponential growth); and mdBook docs.
Fixed — resolved an oxieml 0.1.2 EML→LRA SMT-bridge unsoundness (a spurious Unsat) upstream in oxieml 0.1.3; Unsat is now reported only for genuinely infeasible constraints.
Tips
- Read the whole Pareto front, not just
front.best(). Discovery is Pareto-first: a simpler, slightly-worse law is often the true one. Inspect the (accuracy, complexity) front before you commit to a single equation. - Reach for
--method gated-warm(orauto) when structure nests. Cold differentiable search can miss deep nesting; warm-starting the gated search from the enumerate solution recovers laws likeexp(exp(x)). - Turn the proofs on. Build with the
smtfeature and callSolution::prove_lower_bound/prove_upper_bound/prove_positive/prove_no_root/prove_equivalentfor an OxiZ-backedVerdict; usecertified_root/certified_rangefor interval certificates. Your laws then ship with machine-checkable guarantees. - Let it snap to named constants. LM polish plus named-constant snapping rewrites a fitted
3.14159…toπ(ande,√2, …) whenever that doesn’t worsen the fit — so the output is a clean symbolic law, not a pile of decimals. - Reduce dimensions first. When your inputs carry units, run
DataSet::to_dimensionlessto form Buckingham-π groups and shrink the search space before discovery. - Pick your accelerator. Use
gpu-cuda/gpu-metalfor a GPU-resident fit (~9.4 ms/epoch at 1M rows on an RTX 3060) or the portablegpu-wgpu;gpu_backend()auto-selects CUDA → wgpu → CPU, so the same code runs anywhere.
This is the foundation
phop is not a foundation others build on — it is a consumer of one, a showcase that wires the COOLJAPAN verification and autograd layers into a single discover-and-prove pipeline:
- SciRS2 — autograd (
scirs2-autograd), optimization (LM / L-BFGS / NSGA-III), and e-graphs (scirs2-symbolic). - oxieml — the canonical EML-tree CAS a discovered law lives in.
- OxiZ — the pure-Rust Z3-replacement SMT solver behind the
smtproofs. - OxiLean — the pure-Rust Lean kernel behind the
leanproof-carrying tier. - OxiCUDA — the CUDA and Apple-Metal GPU backends.
- TensorLogic — the neuro-symbolic IR a discovered law maps into.
Every layer is pure Rust — no C, no Fortran, no FFI — so a law discovered by phop is differentiable, certifiable, and provable on one sovereign stack, from the autograd tape to the Lean kernel.
Repository: https://github.com/cool-japan/phop
Star the repo if equation discovery that returns a proof instead of a string — pure Rust, no C, no FFI, and runnable in the browser — is what you’ve been waiting for.
The era of symbolic regression that hands you an unverifiable string is over. Pure-Rust equation discovery — differentiable, provable, and sovereign — is here.
— KitaSan at COOLJAPAN OÜ June 25, 2026