Sebastian Raschka 7/1/2023

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch

Read Original

This article details 9 cumulative techniques for optimizing memory consumption in PyTorch, applicable to models like Vision Transformers and LLMs. It covers methods such as mixed-precision training, gradient accumulation, and parameter offloading, using the Fabric library to simplify implementation and enable training on consumer hardware.

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch

Comments

No comments yet

Be the first to share your thoughts!