Model Deployment articles

1/26/2025 • EN

Create a PyTorch Docker image ready for production

A tutorial on creating a production-ready Docker image for PyTorch models using Torch Serve, including model archiving and dependency management.

docker Mlop Model Deployment Pytorch Torch Serve

Riccardo Padovani

10/17/2024 • EN

Deploy Llama 3.2 Vision on Amazon SageMaker

A technical guide on deploying Meta's Llama 3.2 Vision model on Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Hugging Face large language models Llama 32 Model Deployment

Philipp Schmid

9/24/2024 • EN

Evaluate open LLMs with Vertex AI and Gemini

A technical guide on using Google's Vertex AI Gen AI Evaluation Service with Gemini to evaluate open LLM models like Llama 3.1.

Gemini LLM Evaluation Model Deployment Open Source Llms Vertex AI

Philipp Schmid

4/18/2024 • EN

Deploy Llama 3 on Amazon SageMaker

A technical guide on deploying Meta's Llama 3 70B model on Amazon SageMaker using the Hugging Face LLM DLC and Text Generation Inference.

Amazon Sagemaker Hugging Face large language models Llama 3 Model Deployment

Philipp Schmid

1/25/2024 • EN

Running a local LLM with Ollama

A guide on running a Large Language Model (LLM) locally using Ollama for privacy and offline use, covering setup and performance tips.

llm Local AI Model Deployment Ollama privacy

Jan Ouwens

11/14/2023 • EN

Deploy Llama 2 7B on AWS inferentia2 with Amazon SageMaker

A tutorial on deploying Meta's Llama 2 7B model on AWS Inferentia2 using Amazon SageMaker and the optimum-neuron library.

Amazon Sagemaker AWS Inferentia2 Llama 2 Model Deployment Optimum Neuron

Philipp Schmid

11/7/2023 • EN

Deploy Stable Diffusion XL on AWS inferentia2 with Amazon SageMaker

A tutorial on deploying Stable Diffusion XL for accelerated inference using AWS Inferentia2 and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Deep Learning Inference Model Deployment stable diffusion

Philipp Schmid

10/12/2023 • EN

Deploy Idefics 9B and 80B on Amazon SageMaker

A technical guide on deploying Hugging Face's IDEFICS visual language models (9B & 80B parameters) to Amazon SageMaker using the LLM DLC.

Amazon Sagemaker Idefics large language models Model Deployment Multimodal AI

Philipp Schmid

9/26/2023 • EN

Llama 2 on Amazon SageMaker a Benchmark

A benchmark analysis of deploying Meta's Llama 2 models on Amazon SageMaker using Hugging Face's LLM Inference Container, evaluating cost, latency, and throughput.

Amazon Sagemaker benchmark large language models Llama 2 Model Deployment

Philipp Schmid

6/7/2023 • EN

Deploy Falcon 7B and 40B on Amazon SageMaker

A technical guide on deploying the open-source Falcon 7B and 40B large language models to Amazon SageMaker using the Hugging Face LLM Inference Container.

Amazon Sagemaker Falcon 40b Hugging Face LLM Inference Model Deployment

Philipp Schmid

4/13/2023 • EN

Train and Deploy BLOOM with Amazon SageMaker and PEFT

A technical guide on fine-tuning the BLOOMZ language model using PEFT and LoRA techniques, then deploying it on Amazon SageMaker.

Amazon Sagemaker Bloom Lora Model Deployment Peft

Philipp Schmid

3/20/2023 • EN

Deploy FLAN-UL2 20B on Amazon SageMaker

A technical guide on deploying Google's FLAN-UL2 20B large language model for real-time inference using Amazon SageMaker and Hugging Face.

Amazon Sagemaker Hugging Face Inference Machine Learning Model Deployment

Philipp Schmid

2/8/2023 • EN

Deploy FLAN-T5 XXL on Amazon SageMaker

A technical guide on deploying the FLAN-T5-XXL large language model for real-time inference using Amazon SageMaker and Hugging Face.

Amazon Sagemaker Flant5 Hugging Face Inference Model Deployment

Philipp Schmid

11/1/2022 • EN

Stable Diffusion on Amazon SageMaker

A technical guide on deploying the Stable Diffusion text-to-image model to Amazon SageMaker for real-time inference using the Hugging Face Diffusers library.

Amazon Sagemaker Hugging Face Machine Learning Model Deployment stable diffusion

Philipp Schmid