LLM Deployment articles

7/14/2025 • EN

Running Open-Weight LLMs on AKS with KAITO: A Summary of Model Families

A guide to deploying and comparing open-weight LLM families (DeepSeek, Falcon, Llama, etc.) using the KAITO operator on Azure Kubernetes Service (AKS).

AI Inference Ak Kubernetes LLM Deployment Model Families

Roy Kim

12/3/2024 • EN

Deploy QwQ-32B-Preview the best open Reasoning Model on AWS with Hugging Face

A technical guide on deploying the QwQ-32B-Preview open-source reasoning model on AWS SageMaker using Hugging Face's tools.

Amazon Sagemaker aws Hugging Face LLM Deployment Text Generation Inference

Philipp Schmid

8/5/2024 • EN

Deploy open LLMs with Terraform and Amazon SageMaker

A guide to deploying open-source LLMs like Llama 3 to Amazon SageMaker using Terraform for Infrastructure as Code.

Amazon Sagemaker Infrastructure As Code LLM Deployment Machine Learning Terraform

Philipp Schmid

6/18/2024 • EN

Deploy Mixtral 8x7B on AWS Inferentia2 with Hugging Face Optimum

A technical guide on deploying the Mixtral 8x7B LLM on AWS Inferentia2 using Hugging Face Optimum and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum LLM Deployment Mixtral 8x7b

Philipp Schmid

5/23/2024 • EN

Deploy Llama 3 70B on AWS Inferentia2 with Hugging Face Optimum

A technical guide on deploying Meta's Llama 3 70B Instruct model on AWS Inferentia2 using Hugging Face Optimum and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum LLM Deployment Meta Llama 3

Philipp Schmid

5/2/2024 • EN

Deploy open LLMs with vLLM on Hugging Face Inference Endpoints

A tutorial on deploying open-source large language models (LLMs) like Llama 3 using the vLLM framework on Hugging Face Inference Endpoints.

Hugging Face Inference Endpoints large language models LLM Deployment Vllm

Philipp Schmid

3/26/2024 • EN

Deploy Llama 2 70B on AWS Inferentia2 with Hugging Face Optimum

A technical guide on deploying Meta's Llama 2 70B large language model on AWS Inferentia2 hardware using Hugging Face Optimum and SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum Llama 2 LLM Deployment

Philipp Schmid

12/12/2023 • EN

Deploy Mixtral 8x7B on Amazon SageMaker

A technical guide on deploying the Mixtral 8x7B open-source LLM from Mistral AI to Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Hugging Face LLM Deployment Mixture Of Experts Text Generation Inference

Philipp Schmid

9/7/2023 • EN

Deploy Falcon 180B on Amazon SageMaker

A technical guide on deploying the Falcon 180B open-source large language model to Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Falcon 180b Hugging Face LLM Deployment Text Generation Inference

Philipp Schmid

8/7/2023 • EN

Deploy Llama 2 7B/13B/70B on Amazon SageMaker

A technical guide on deploying Meta's Llama 2 large language models (7B, 13B, 70B) on Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Hugging Face Llama 2 LLM Deployment Text Generation Inference

Philipp Schmid

7/4/2023 • EN

Deploy LLMs with Hugging Face Inference Endpoints

A guide to deploying open-source Large Language Models (LLMs) like Falcon using Hugging Face's managed Inference Endpoints service.

api Hugging Face Inference Endpoints LLM Deployment Machine Learning

Philipp Schmid

6/20/2023 • EN

Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker

A technical guide on deploying open-source Large Language Models (LLMs) from Amazon S3 to Amazon SageMaker using Hugging Face's LLM Inference Container within a VPC.

Amazon Sagemaker AWS Vpc Hugging Face LLM Deployment Model Inference

Philipp Schmid