Inference Optimization articles

2/7/2025 • EN

Notes on ‘AI Engineering’ chapter 9: Inference Optimisation

Summary of key concepts for optimizing AI inference performance, covering bottlenecks, metrics, and deployment patterns from Chip Huyen's book.

Hardware Optimization Inference Optimization llm Machine Learning Model Optimization

Alex Strick van Linschoten

1/10/2023 • EN

Large Transformer Model Inference Optimization

Explores techniques to optimize inference speed and memory usage for large transformer models, including distillation, pruning, and quantization.

Attention Mechanism Inference Optimization Kv Cache Model Compression Transformer Models

Lilian Weng

8/16/2022 • EN

Accelerate BERT inference with DeepSpeed-Inference on GPUs

Learn to optimize BERT and RoBERTa models for faster GPU inference using DeepSpeed-Inference, reducing latency from 30ms to 10ms.

Bert Deepspeed Inference Gpu Inference Optimization Transformers

Philipp Schmid

3/16/2022 • EN

Speed up BERT inference with Hugging Face Transformers and AWS Inferentia

A tutorial on accelerating BERT model inference using Hugging Face Transformers and AWS Inferentia chips for cost-effective, high-performance deployment.

Amazon Sagemaker AWS Inferentia Bert Hugging Face Transformers Inference Optimization

Philipp Schmid

2/22/2022 • EN

Multi-Container Endpoints with Hugging Face Transformers and Amazon SageMaker

Guide to deploying multiple Hugging Face Transformer models as a cost-optimized Multi-Container Endpoint using Amazon SageMaker.

Amazon Sagemaker Hugging Face Transformers Inference Optimization Machine Learning Deployment Multi Container Endpoint

Philipp Schmid

Inference Optimization Articles

Notes on ‘AI Engineering’ chapter 9: Inference Optimisation

Large Transformer Model Inference Optimization

Accelerate BERT inference with DeepSpeed-Inference on GPUs

Speed up BERT inference with Hugging Face Transformers and AWS Inferentia

Multi-Container Endpoints with Hugging Face Transformers and Amazon SageMaker

Select Language

We use cookies