#ModelEval

Tech Chillitechchiili
2025-06-06

🧠 Can AI models tell when they’re being evaluated?

New research says yes — often.
→ Gemini 2.5 Pro: AUC 0.95
→ Claude 3.7 Sonnet: 93% accuracy on test purpose
→ GPT-4.1: 55% on open-ended detection

Models pick up on red-teaming cues, prompt style, & synthetic data.

⚠️ Implication: If models behave differently when tested, benchmarks might overstate real-world safety.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst