Optimizing LLMs From a Dataset Perspective
Strategies for improving LLM performance through dataset-centric fine-tuning, focusing on instruction datasets rather than model architecture changes.
SebastianRaschka.com is the personal blog of Sebastian Raschka, PhD, an LLM research engineer whose work bridges academia and industry in AI and machine learning. On his blog and notes section he publishes deep, well-documented articles on topics such as LLMs (large language models), reasoning models, machine learning in Python, neural networks, data science workflows, and deep learning architecture. Recent posts explore advanced themes like “reasoning LLMs”, comparisons of modern open-weight transformer architectures, and guides for building, training, or analyzing neural networks and model internals.
110 articles from this blog
Strategies for improving LLM performance through dataset-centric fine-tuning, focusing on instruction datasets rather than model architecture changes.
A guide to participating in the NeurIPS 2023 LLM Efficiency Challenge, focusing on efficient fine-tuning of large language models on a single GPU.
Techniques to reduce memory usage by up to 20x when training LLMs and Vision Transformers in PyTorch.
A guide to efficiently finetuning Falcon LLMs using parameter-efficient methods like LoRA and Adapters to reduce compute time and cost.
Exploring mixed-precision techniques to speed up large language model training and inference by up to 3x without losing accuracy.
Learn about Low-Rank Adaptation (LoRA), a parameter-efficient method for finetuning large language models with reduced computational costs.
A guide to parameter-efficient finetuning methods for large language models, covering techniques like prefix tuning and LLaMA-Adapters.
Guide to finetuning large language models on a single GPU using gradient accumulation to overcome memory limitations.
A guide on managing the flood of AI and machine learning research, covering tools and strategies for prioritizing papers and news.
Learn techniques to speed up PyTorch model training by 8x using PyTorch Lightning, maintaining accuracy while reducing training time.
A tutorial on coding self-attention, multi-head attention, causal attention, and cross-attention in LLMs using Python and PyTorch.
A technical guide to coding the self-attention mechanism from scratch, as used in transformers and large language models.
A curated reading list of key academic papers for understanding the development and architecture of large language models and transformers.
An overview of four different methods for detecting AI-generated text, including OpenAI's AI Classifier, DetectGPT, GPTZero, and watermarking.
A comparison of AutoAugment, RandAugment, AugMix, and TrivialAugment image augmentation methods in PyTorch for reducing overfitting.
Analyzes the limitations of AI chatbots like ChatGPT in providing accurate technical answers and discusses the need for curated data and human experts.
Learn how to train an XGBoost classifier using cloud GPUs without managing infrastructure via the Lightning AI framework.
A curated list of the top 10 open-source machine learning and AI projects released or updated in 2022, including PyTorch 2.0 and scikit-learn 1.2.
A review of the top 10 most influential machine learning papers from 2022, including ConvNeXt and MaxViT, with technical analysis.
Author announces the launch of 'Ahead of AI', a monthly newsletter covering AI trends, educational content, and personal updates on machine learning projects.