Shreya Shankar 4/8/2024

Comparing LLMs on "Real-World" Retrieval

Read Original

The article details a personal evaluation of 8 instruction-tuned LLMs (including GPT-4, Claude, Gemini, and open-source models) on a custom "real-world" retrieval task. The author uses ~85 doctor-patient transcripts to test model performance on three questions of varying difficulty, moving beyond standard benchmarks to assess reasoning on unstructured data likely absent from training sets.

Comparing LLMs on "Real-World" Retrieval

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week