Philipp Schmid 2/16/2023

Fine-tune FLAN-T5 XL/XXL using DeepSpeed and Hugging Face Transformers

Read Original

This article provides a detailed tutorial on fine-tuning the large-scale FLAN-T5 XL (3B) and XXL (11B) language models. It explains how to leverage DeepSpeed ZeRO for memory optimization and model parallelism across multiple GPUs using the Hugging Face Transformers library, specifically for a summarization task on the CNN Dailymail dataset.

Fine-tune FLAN-T5 XL/XXL using DeepSpeed and Hugging Face Transformers

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser