NLP articles

1/17/2025 • EN

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

A step-by-step guide to implementing the Byte Pair Encoding (BPE) tokenizer from scratch, used in models like GPT and Llama.

algorithm Byte Pair Encoding llm NLP Tokenizer

Sebastian Raschka

1/17/2025 • EN

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

A step-by-step educational guide to building a Byte Pair Encoding (BPE) tokenizer from scratch, as used in models like GPT and Llama.

algorithm Bpe llm NLP Tokenization

Sebastian Raschka

12/19/2024 • EN

Finally, a Replacement for BERT: Introducing ModernBERT

Introducing ModernBERT, a new family of state-of-the-art encoder models designed as a faster, more efficient replacement for the widely-used BERT.

Bert Masked Language Model Modernbert NLP Transformers

Jeremy Howard

5/21/2023 • EN

Some Intuition on Attention and the Transformer

Explains the intuition behind the Attention mechanism and Transformer architecture, focusing on solving issues in machine translation and language modeling.

Attention Mechanism Deep Learning llm NLP Transformer

Eugene Yan

4/30/2023 • EN

Interacting with LLMs with Minimal Chat

Explores user interfaces for LLMs that minimize text chat, using clicks and user context for more intuitive interactions.

llm NLP recommendation systems ui-design user experience

Eugene Yan

3/27/2023 • EN

Replacing an A/B Test with GPT

Explores using GPT-3 text embeddings and a simple classifier to predict the winner of a headline A/B test, potentially replacing traditional testing.

ab testing Gpt 3 llm Machine Learning NLP

Will Kurt

1/19/2022 • EN

Financial Text Summarization with Hugging Face Transformers, Keras and Amazon SageMaker

A tutorial on fine-tuning a Hugging Face Transformer model for financial text summarization using Keras and Amazon SageMaker.

Amazon Sagemaker Hugging Face Transformers Kera NLP Text Summarization

Philipp Schmid

12/29/2021 • EN

Workshop: Enterprise-Scale NLP with Hugging Face and Amazon SageMaker

A workshop series on using Hugging Face Transformers with Amazon SageMaker for enterprise-scale NLP, covering training, deployment, and MLOps.

Amazon Sagemaker Hugging Face Mlop NLP Transformers

Philipp Schmid

11/11/2021 • EN

A remote guide to re:Invent 2021 machine learning sessions

A guide to attending AWS re:Invent 2021 machine learning and NLP sessions remotely, featuring keynotes and top session recommendations.

Amazon Sagemaker aws Machine Learning NLP Reinvent

Philipp Schmid

3/21/2021 • EN

Reducing Toxicity in Language Models

Explores the challenge of defining and reducing toxic content in large language models, discussing categorization and safety methods.

AI Safety Bia Language Models NLP Toxicity

Lilian Weng

12/17/2020 • EN

Interfaces for Explaining Transformer Language Models

Explores interactive methods for interpreting transformer language models, focusing on input saliency and neuron activation analysis.

Interpretability Language Models Neural Networks NLP Transformer

Jay Alammar

8/30/2020 • EN

How Reading Papers Helps You Be a More Effective Data Scientist

Explains how regularly reading academic papers improves data science skills, offering practical advice on selection and application.

Data Science Machine Learning NLP Research Papers Transfer Learning

Eugene Yan

6/30/2020 • EN

Serverless BERT with HuggingFace and AWS Lambda

A tutorial on deploying a BERT question-answering model in a serverless environment using HuggingFace Transformers and AWS Lambda.

AWS Lambda Bert Huggingface NLP serverless

Philipp Schmid

5/22/2020 • EN

BERT Text Classification in a different language

A tutorial on building a non-English text classification model using BERT and Simple Transformers, demonstrated with German tweets.

Bert Multilingual Models NLP Text Classification Transformers

Philipp Schmid

4/7/2020 • EN

The Transformer Family

An updated overview of the Transformer model family, covering improvements for longer attention spans, efficiency, and new architectures since 2020.

Attention Mechanism Machine Learning Neural Networks NLP Transformer

Lilian Weng

2/14/2020 • EN

Tools and Frameworks

A curated list of open-source and free tools for data annotation across computer vision, NLP, audio, and other domains, including image and video labeling.

computer vision Data Annotation Machine Learning NLP open source

Igor Susmelj

11/28/2019 • EN

The Accessibility of GPT-2 - Text Generation and Fine-tuning

A tutorial on using HuggingFace's API to access and fine-tune OpenAI's GPT-2 model for text generation.

Fine Tuning Gpt 2 NLP Text Generation Transformers

Yoel Zeldes

9/27/2018 • EN

How to write a racist AI in R without really trying

A tutorial replicating a Python experiment on creating a biased AI sentiment classifier, but using R, GloVe embeddings, and glmnet for logistic regression.

Glove NLP R Sentiment Analysis Word Embeddings

Thomas Lumley