Text Generation Inference articles

12/3/2024 • EN

A technical guide on deploying the QwQ-32B-Preview open-source reasoning model on AWS SageMaker using Hugging Face's tools.

Amazon Sagemaker aws Hugging Face LLM Deployment Text Generation Inference

9/19/2024 • EN

A guide to evaluating Large Language Models (LLMs) using the Evaluation Harness framework and optimized serving tools like Hugging Face TGI and vLLM.

benchmarking Evaluation Harness LLM Evaluation Text Generation Inference Vllm

1/11/2024 • EN

Guide to scaling LLM inference on Amazon SageMaker using new multi-replica endpoints for improved throughput and cost efficiency.

Amazon Sagemaker Hugging Face LLM Inference Multi Replica Endpoints Text Generation Inference

12/12/2023 • EN

A technical guide on deploying the Mixtral 8x7B open-source LLM from Mistral AI to Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Hugging Face LLM Deployment Mixture Of Experts Text Generation Inference

9/7/2023 • EN

A technical guide on deploying the Falcon 180B open-source large language model to Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Falcon 180b Hugging Face LLM Deployment Text Generation Inference

8/7/2023 • EN

A technical guide on deploying Meta's Llama 2 large language models (7B, 13B, 70B) on Amazon SageMaker using the Hugging Face LLM DLC.

Amazon Sagemaker Hugging Face Llama 2 LLM Deployment Text Generation Inference

5/31/2023 • EN

Guide to deploying open-source LLMs like BLOOM and Open Assistant to Amazon SageMaker using Hugging Face's new LLM Inference Container.

Amazon Sagemaker Hugging Face large language models LLM Inference Text Generation Inference

Text Generation Inference Articles