10/15/2025
•
EN
Serious Data From Testing LLMs
A data-driven analysis of LLM performance on a simple retrieval task, highlighting the need for evidence-based AI testing.