Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

TestGen

TestGen proves that every implementation behaves identically across every platform. One formalized workflow corpus runs across a matrix of tiers — 6 surfaces × 2 backends = 12 tiers — and each tier must conform to a platform-free reference tier. There is zero per-platform test duplication: adapters supply thin shims, and a single tier-agnostic runner drives all workflows through all pools.

How it works

  1. Adapter registry. Each platform registers a TestAdapter per (surface, backend) pair; a factory turns declarative capability data into runnable matrix cells.
  2. Static validation. Before any tier runs, workflows are checked app-agnostically for hard violations (lookup-before-use, no-self-send, capture-before-ref) — author-time errors that block the whole matrix.
  3. Tier-agnostic execution. One runWorkflow walks a JSON workflow through any pool — onboarding, step execution, variable interpolation, settling — branching zero times on platform.
  4. Deterministic settling. After each step, the runner awaits frontier coverage (ingested ≥ written per enclave) so a see assertion runs at quiescence — no timing constants, no flaky cross-user races.
  5. Conformance witness. For a fixed app and seed, the reference tier and a candidate tier must produce identical observable trajectories and verdicts; any divergence is a hard failure — this catches tiers that pass for the wrong reason.
  6. Meta-fuzz. Random valid app manifests (built from the same primitives real apps use) are generated by seeded PRNG, surfacing SDK regressions at far greater coverage than hand-written tests.
  7. Nuke protocol. The first assertion failure SIGKILLs the whole process group; layered time budgets (per step / workflow / app / sweep) make hangs impossible. Results are deterministic by construction.

What makes it sound

This is the operational discharge of the matrix theorem: for every workflow and every tier that accepts it, the tier's realization is observationally equal to the reference. Settling is proven from sequential consistency on totally-ordered logs; conformance-fuzz proves a tier is a faithful realization rather than merely green; the nuke protocol makes the whole suite reproducible and hang-proof. If TestGen passes on all tiers, the implementation is proven sound for that spec revision.

See also