Rajesh P • 3/31/2026

Building a Prompt Evaluation System with Spring AI & Claude— Part 2

This article is the second part of a series on building a prompt evaluation system using Spring AI and Claude. It details the architecture and component flow of a Spring Boot command-line runner that automatically generates a dataset of Java functions, runs an explainer prompt against each, grades responses using an LLM-as-judge, and produces a formatted Excel report with scores and reasoning. The system measures prompt quality across four dimensions: accuracy, simplicity, completeness, and conciseness. The key design principle is keeping the dataset and grader constant while only the prompt changes, enabling iterative improvement. The tech stack includes Java 17, Spring Boot 3.2.5, Spring AI 1.0.0-M6, Claude Haiku, and Apache POI 5.2.5.

0 comments

#Java #Claude #Spring AI