Jeremy Howard 2/10/2025

TIL: Masked Language Models Are Surprisingly Capable Zero-Shot Learners

Read Original

The article introduces ModernBERT-Large-Instruct, an instruction-tuned encoder model that uses its Masked Language Modeling head to perform classification and multiple-choice tasks zero-shot. It details how this approach outperforms other small models on benchmarks like MMLU-Pro and matches traditional fine-tuning, all with a simple training recipe and easy-to-use code on HuggingFace.

TIL: Masked Language Models Are Surprisingly Capable Zero-Shot Learners

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet