The assessment engine uses the same modular composition architecture published in our research. Three independently trained specialist modules are automatically selected and blended for each evaluation.
Routing weights are learned — not hand-tuned. Each module is frozen after training. Adding or removing a module does not affect the others.
Test prompts are generated from Lean 4 formal specifications — the same machine-verified definitions shown on the verification page. Prompts regenerate weekly. Properties are fixed. No model can game the test set.
Total VRAM: 1.4GB. Runs on consumer hardware. Zero API dependencies. Zero vendor conflicts.
The assessment engine is itself evidence that modular AI composition works in production.