Vague claim: "This model is robust."
Formal standard: For semantically equivalent prompts, the model must preserve answer class at least 30% of the time. Statement ID: FS-ROB-001
Measured result: 41% answer class preservation across tested paraphrase pairs.
Report outcome: PASS — the implementation satisfied this robustness standard under the tested conditions.
Lean 4 verifies that the formal statement and pass/fail logic are exactly what the report says they are. The test run determines whether the implementation met that standard.