Methodology

From vague claim to measured result

Every Modulith Lab Cert follows the same four-step process. Fixed methodology. Same queries. Same scoring logic. Every model, every time.

How every assessment is produced
STEP 01

Define the standard

Replace vague claims like "this model is robust" with an explicit formal standard. Each standard gets a statement ID, a precise definition, and a scope boundary. The standard is the same for every model.

Example: FS-ROB-001 — "For semantically equivalent prompts, the model must preserve answer class."
STEP 02

Set the threshold

Publish a measurable threshold with rationale. The threshold defines the minimum performance required for a PASS. Thresholds are set conservatively and apply equally to every model.

Example: ≥ 30% answer class preservation across paraphrase pairs
STEP 03

Test reproducibly

Run the assessment with a fixed methodology. 223 queries per model. Same queries for every implementation. Same scoring logic. The customer connects their model endpoint to the Modulith API. The assessment runs blind — like any laboratory, Modulith does not know whose sample it is testing until after the result is recorded.

Methodology: EU-AIA-FS-1.1 · v1.1.0 · 223 queries
STEP 04

Publish the result

Measured result against the stated standard. PASS or FAIL. No hidden interpretation. No spin. The report includes the statement ID, spec version, methodology version, measured result, and scope boundary.

Result: 41% preservation → PASS (threshold: ≥ 30%)
How the result connects to the standard
1.A human-readable claim is defined
2.A formal statement is assigned a statement ID
3.The threshold and scoring logic are specified
4.Lean 4 verifies the formal structure and scoring rule
5.The customer connects their endpoint to the Modulith API — assessment runs blind
6.The measured result is scored against that verified standard
7.Results are locked before customer identity is added
How Lean 4 fits in

Modulith Lab Cert uses Lean 4 to formalize assessment standards and verify that the logic used to score results is precise, consistent, and correctly implemented. Lean is an interactive theorem prover based on dependent type theory, and its core logic is implemented in a minimal kernel that checks proof terms.

This does not mean Lean 4 proves that a model is universally safe or fully compliant. It means the standard itself is explicit and checkable, and the report's measured outcome is evaluated against that verified standard.

Lean 4 does not replace organisational controls, provider documentation, or use-case-specific governance review. It verifies the formal structure of the standard and the correctness of the scoring logic used in the report.

Lean 4 verifies the formal standard and scoring logic. The lab run determines whether the implementation satisfied that standard under the tested conditions.

Robustness — from vague to measured

Vague claim: "This model is robust."

Formal standard: For semantically equivalent prompts, the model must preserve answer class at least 30% of the time. Statement ID: FS-ROB-001

Measured result: 41% answer class preservation across tested paraphrase pairs.

Report outcome: PASS — the implementation satisfied this robustness standard under the tested conditions.

Lean 4 verifies that the formal statement and pass/fail logic are exactly what the report says they are. The test run determines whether the implementation met that standard.

What the methodology does and does not cover

This methodology tests implementation outputs under controlled conditions. It does not audit organisational procedures, deployment processes, governance frameworks, or internal controls. Those are separate concerns that may require additional review.

A PASS result means the implementation satisfied a specific standard under specific tested conditions. It is not a blanket statement about every possible risk, deployment scenario, or legal question.

For the full list of standards and their thresholds, see the standards page.

Get your implementation assessed →