Jeremy Howard 2/10/2025

TIL: Masked Language Models Are Surprisingly Capable Zero-Shot Learners

Read Original

The article introduces ModernBERT-Large-Instruct, an instruction-tuned encoder model that uses its Masked Language Modeling head to perform classification and multiple-choice tasks zero-shot. It details how this approach outperforms other small models on benchmarks like MMLU-Pro and matches traditional fine-tuning, all with a simple training recipe and easy-to-use code on HuggingFace.

TIL: Masked Language Models Are Surprisingly Capable Zero-Shot Learners

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser