James Bach • 10/6/2025

Seriously Testing LLMs

The article discusses the significant difficulties in testing Generative AI and LLMs, highlighting their inherent 'sortaness' and the high cost of responsible testing. It argues that AI testing is akin to platform or cybersecurity testing, with unbounded regression problems and unreliable assumptions. The authors critique superficial AI demos and advocate for smarter, more rigorous testing methodologies.

0 comments

#software testing #generative ai #ai testing