No model passes: Transparency | 6 critical changes this week
View all signals → Updated 5 Apr 2026, 10:30 AM
Biggest regression
Grok 3 +18% risk
Security
Biggest improvement
Gemini 1.5 -9% risk
Reliability
New threshold crossings
2 models
Became high risk
Models to watch
3 unstable
High week-over-week volatility
Weekly digest
Sent every Monday
Manage alerts
Risk Map
Positioned by Likelihood (→) and Impact (↑)
View by
Critical Zone High impact × high likelihood
Impact / Harm ↑
LowMedHighCrit
LowMediumHighCritical
Low 0.00–0.25
Medium 0.25–0.50
High 0.50–0.75
Critical 0.75–1.00
We test models with thousands of behavior probes. How often issues occur (likelihood) and how severe they are (impact).
Click a model on the map
to view details
1 An Claude Haiku 4.5 Critical
2 xA Grok 4.1 Critical
3 Oa GPT-4o High
See full ranking →
Overall
🏆
Gemini 3.1 Flash
Most improved
Gemini 3.1 Flash
-9%
Strongest security
🛡
Mistral Large
Most stable
Gemma 2

Loading summary...

Compare ModeSelect up to 3 models