phop 0.1.0 Released — Differentiable Symbolic Discovery That Finds (and Proves) the Equation

A symbolic-regression engine that never hands you an unverifiable string — it hands you a differentiable EML tree it can canonicalize, certify, and machine-prove, in pure Rust, all the way into your browser.

Today we released phop 0.1.0 — a pure-Rust differentiable symbolic-discovery engine that learns both the topology (the shape of the expression tree) and the numeric constants of closed-form laws by gradient descent. Its name is the Thai word พบ, “to discover, to encounter,” and its tagline is the whole thesis: Discover the equation. One operator at a time.

The trick that makes gradient descent over expression structure possible is a single operator. Every internal node of every candidate tree is the same binary operation — the EML operator,

eml(x, y) = exp(x) − ln(y)

which (Odrzywołek 2026, arXiv:2603.21852) is functionally complete for the elementary functions once you add the constant 1. Because every node is identical, an entire population of candidate expressions evaluates inside one automatic-differentiation graph, and the only categorical choice left — which source feeds each leaf — is relaxed with Gumbel-Softmax and descended end-to-end. Conventional symbolic regression, with its heterogeneous operator sets and discrete tree search, simply cannot ride a gradient into the topology. phop can.

This is the first public release — Apache-2.0, a six-crate Cargo workspace: phop-core (the engine), phop-cli (phop discover), phop-py (PyO3), phop-wasm (WebAssembly), phop-bench, and phop-examples — over 10,000 lines of pure Rust (~11,800 with docs). It is built on the SciRS2 ecosystem (scirs2-autograd, scirs2-core, scirs2-optimize, scirs2-symbolic) and the oxieml EML library. No C. No FFI. cargo nextest --workspace --all-features green, zero clippy/rustdoc warnings.

Why phop 0.1.0 is a game changer

Symbolic regression promises the dream of automated science: hand it data, get back the equation. But the established tools — PySR, SymbolicRegression.jl, AI-Feynman, the late Eureqa — all reach that dream through the same narrow door, and it is a door that closes off everything you actually want to do next.

The incumbent pain:

Discrete structure search. Evolutionary tree mutation is not differentiable — you cannot ride a gradient into the topology, so structure is found by trial, mutation, and luck.
Heterogeneous operator sets. Mixing +, ×, sin, exp, … makes the search space discrete and the evaluation graph non-uniform — there is no single tensor to differentiate.
The output is a string. A formula printed as text can’t be directly canonicalized, bounded, or proven — you get an answer you then have to re-derive trust in by hand.
Python/Julia runtimes. No sovereign single-binary deploy, no in-browser story — the discovery engine lives wherever its interpreter lives.

Then: phop 0.1.0 ends all of that.

One operator, one autograd graph. A single EML node type makes the topology itself differentiable (discover_gumbel) — and so is depth: discover_gated gives every node a learnable expand/terminate gate, so tree shape is learned under a depth curriculum. exp(x), exp(x) − ln(y), and exp(exp(x)) are recovered exactly; warm-started gated search (discover_gated_warm) recovers nested structure like exp(exp(x)) that the cold search misses.
Laws that ship with proofs. A discovered law is a live oxieml EML tree, not a string — so the entire cool-japan verification ladder applies directly: interval-certified roots (Newton/Krawczyk) and sound range enclosures; SMT proofs (the smt feature) of bounds, sign, no-root, and equivalence via OxiZ, the pure-Rust Z3 replacement, with every counterexample re-verified by exact forward evaluation; and a Lean proof-carrying tier (the lean feature) that machine-checks the EML rewrite ∀x, eml x 1 = exp x against a postulated theory via OxiLean, the pure-Rust Lean kernel.
Pareto-first. phop returns the non-dominated (accuracy, complexity) front, not a single answer — a simpler, slightly-worse law is often the true one — plus NSGA-III ranking over accuracy, complexity, interpretability, and elegance.
Runs everywhere. CLI, Python (PyO3), GPU, and the entire discover→prove pipeline client-side in the browser via WebAssembly — one ~0.9 MB (gzipped) WASM module, no server, no install.
GPU-resident. CUDA (gpu-cuda, oxicuda 0.3) and Apple Metal (gpu-metal, oxicuda-metal 0.3) backends with a GPU-resident constant fit plus Gumbel topology and gradient-checked reverse-mode analytic gradients — ~9.4 ms/epoch at 1M rows on an RTX 3060 (forward matches CPU to < 1e-3, ~8× faster at ≥ 100k rows). A portable gpu-wgpu WGSL forward covers WebGPU/Metal/Vulkan/DX12; gpu_backend() auto-selects CUDA → wgpu → CPU.

Technical Deep Dive: one operator, one autograd graph

Under the hood, phop 0.1.0 is four cooperating layers.

Layer A — the tensorized forest. An EML tree evaluates as a single batched autograd graph with guarded exp/ln (clamped arguments, NaN-safe) that match oxieml numerically while staying finite on the worst inputs — exp overflow, ln of a non-positive number — so the gradient never sees an infinity.

Layer B — differentiable topology and depth. Over a complete-tree skeleton, each leaf holds Gumbel-Softmax logits across its possible sources — the input variables plus a learnable constant. The temperature anneals from exploratory to near-discrete, then argmax-hardens into a concrete tree. Depth is learnable too: every non-root node carries an expand/terminate gate σ(z_n). Because eml has no identity or skip operation, a node does not zero out DARTS-style — it terminates into a leaf, collapsing the subtree below it.

Layer C — the Pareto front. Candidates are scored on (accuracy, complexity), and the non-dominated front is returned — so you see the trade-off rather than a single forced choice. Solution::predict(x) evaluates any recovered law on fresh data.

Layer D — distillation and verification. Constants are sharpened by Levenberg–Marquardt and snapped to named constants (π, e, √2, …) whenever that doesn’t worsen the fit. The tree then renders to canonical LaTeX — via oxieml lowering and an optional egraph equality-saturation pass through scirs2-symbolic (e.g. ln(exp(x)) → x) — as well as to Rust, NumPy, and SymPy. Finally the verification ladder runs: interval certificates → SMT → Lean.

Beyond the four layers, phop ships a meta-search: discover_auto (--method auto) runs every EML-tree search plus the rich-leaf affine engine and oxieml’s own GA/beam/MCTS, then merges their Pareto fronts. The rich-leaf affine engine (discover_affine, with Linear and LogLinear leaves) recovers product, power, and ratio laws directly; DataSet::to_dimensionless performs Buckingham-π dimensional reduction before the search; and discover_ode recovers the right-hand side of an ODE.

Getting Started

phop 0.1.0 installs on four surfaces — the Rust engine, the CLI, Python, and the browser.

The Rust library (the engine):

cargo add phop-core

use phop_core::{Config, DataSet, Discoverer};
use scirs2_core::ndarray::{Array1, Array2};

// y = exp(x)
let xs: Vec<f64> = (0..50).map(|i| i as f64 * 0.08).collect();
let x = Array2::from_shape_vec((xs.len(), 1), xs.clone()).unwrap();
let y = Array1::from(xs.iter().map(|v| v.exp()).collect::<Vec<_>>());
let ds = DataSet::from_arrays(x, y).unwrap();

let front = Discoverer::new(Config::default().max_depth(2)).fit(&ds).unwrap();
let best = front.best().unwrap();
println!("{}  (mse = {:.2e})", best.latex(), best.mse);
// => e^{x_{0}}  (mse = 0.00e0)

The CLI:

cargo install phop-cli
phop discover data.csv --top-k 5 --format latex
# methods: enumerate | gumbel | gated | gated-warm | auto | rich
# formats: table | latex | rust | json

Python (PyPI):

pip install phop

import numpy as np, phop
x = np.linspace(0, 4, 50).reshape(-1, 1)
y = np.exp(x[:, 0])
res = phop.Discoverer(max_depth=2).fit(x, y)
print(res.top_latex(5))

The browser, via npm:

npm install @cooljapan/phop-wasm

The WASM build runs the whole discover → canonicalize → certify → SMT-prove pipeline client-side (discover_and_verify), in ~0.9 MB gzipped, with no backend.

What’s New in 0.1.0

Discovery — a tensorized guarded EML forest; an enumerate → Adam → LM → named-constant snapping → Pareto → code-distillation pipeline that is deterministic per seed; Gumbel-Softmax differentiable leaf topology; gated differentiable depth with warm-start (recovers exp(exp(x))); structural penalties for complexity, sparsity, and parsimony; a rich-leaf affine engine (Linear/LogLinear, AI-Feynman-style rational-exponent and named-constant snapping); and the merged discover_auto_all meta-search.

GPU — CUDA (gpu-cuda, oxicuda 0.3) and Apple Metal (gpu-metal, oxicuda-metal 0.3) at CUDA parity, plus a portable WGSL gpu-wgpu forward; gpu_backend() selects CUDA → wgpu → CPU.

Verification — certified roots and sound range enclosures; OxiZ-backed SMT proofs (smt); an OxiLean proof-carrying tier (lean).

Analysis — Buckingham-π dimensional analysis; ODE right-hand-side discovery (discover_ode); NSGA-III multi-objective ranking; e-graph canonicalization (egraph); a TensorLogic neuro-symbolic bridge (tensorlogic); robust losses (MSE / Huber / Trimmed); and distillation to LaTeX, Rust, NumPy, and SymPy.

Interfaces — a CLI (discover + predict); Python (PyO3, GIL released, .pyi stubs); WASM (discover_json / discover_and_verify, npm @cooljapan/phop-wasm); a criterion bench harness; worked examples (Kepler, Planck, Michaelis–Menten, Black–Scholes, exponential growth); and mdBook docs.

Fixed — resolved an oxieml 0.1.2 EML→LRA SMT-bridge unsoundness (a spurious Unsat) upstream in oxieml 0.1.3; Unsat is now reported only for genuinely infeasible constraints.

Tips

Read the whole Pareto front, not just front.best(). Discovery is Pareto-first: a simpler, slightly-worse law is often the true one. Inspect the (accuracy, complexity) front before you commit to a single equation.
Reach for --method gated-warm (or auto) when structure nests. Cold differentiable search can miss deep nesting; warm-starting the gated search from the enumerate solution recovers laws like exp(exp(x)).
Turn the proofs on. Build with the smt feature and call Solution::prove_lower_bound / prove_upper_bound / prove_positive / prove_no_root / prove_equivalent for an OxiZ-backed Verdict; use certified_root / certified_range for interval certificates. Your laws then ship with machine-checkable guarantees.
Let it snap to named constants. LM polish plus named-constant snapping rewrites a fitted 3.14159… to π (and e, √2, …) whenever that doesn’t worsen the fit — so the output is a clean symbolic law, not a pile of decimals.
Reduce dimensions first. When your inputs carry units, run DataSet::to_dimensionless to form Buckingham-π groups and shrink the search space before discovery.
Pick your accelerator. Use gpu-cuda / gpu-metal for a GPU-resident fit (~9.4 ms/epoch at 1M rows on an RTX 3060) or the portable gpu-wgpu; gpu_backend() auto-selects CUDA → wgpu → CPU, so the same code runs anywhere.

This is the foundation

phop is not a foundation others build on — it is a consumer of one, a showcase that wires the COOLJAPAN verification and autograd layers into a single discover-and-prove pipeline:

SciRS2 — autograd (scirs2-autograd), optimization (LM / L-BFGS / NSGA-III), and e-graphs (scirs2-symbolic).
oxieml — the canonical EML-tree CAS a discovered law lives in.
OxiZ — the pure-Rust Z3-replacement SMT solver behind the smt proofs.
OxiLean — the pure-Rust Lean kernel behind the lean proof-carrying tier.
OxiCUDA — the CUDA and Apple-Metal GPU backends.
TensorLogic — the neuro-symbolic IR a discovered law maps into.

Every layer is pure Rust — no C, no Fortran, no FFI — so a law discovered by phop is differentiable, certifiable, and provable on one sovereign stack, from the autograd tape to the Lean kernel.

Repository: https://github.com/cool-japan/phop

Star the repo if equation discovery that returns a proof instead of a string — pure Rust, no C, no FFI, and runnable in the browser — is what you’ve been waiting for.

The era of symbolic regression that hands you an unverifiable string is over. Pure-Rust equation discovery — differentiable, provable, and sovereign — is here.

— KitaSan at COOLJAPAN OÜ June 25, 2026