AWS Inferentia2 articles

6/18/2024 • EN

A technical guide on deploying the Mixtral 8x7B LLM on AWS Inferentia2 using Hugging Face Optimum and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum LLM Deployment Mixtral 8x7b

5/23/2024 • EN

A technical guide on deploying Meta's Llama 3 70B Instruct model on AWS Inferentia2 using Hugging Face Optimum and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum LLM Deployment Meta Llama 3

3/26/2024 • EN

A technical guide on deploying Meta's Llama 2 70B large language model on AWS Inferentia2 hardware using Hugging Face Optimum and SageMaker.

Amazon Sagemaker AWS Inferentia2 Hugging Face Optimum Llama 2 LLM Deployment

11/21/2023 • EN

Tutorial on deploying embedding models using AWS Inferentia2 and Amazon SageMaker for accelerated inference performance.

Amazon Sagemaker aws AWS Inferentia2 Embeddings Model Optimum Neuron

11/14/2023 • EN

A tutorial on deploying Meta's Llama 2 7B model on AWS Inferentia2 using Amazon SageMaker and the optimum-neuron library.

Amazon Sagemaker AWS Inferentia2 Llama 2 Model Deployment Optimum Neuron

11/7/2023 • EN

A tutorial on deploying Stable Diffusion XL for accelerated inference using AWS Inferentia2 and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Deep Learning Inference Model Deployment stable diffusion

6/28/2023 • EN

A tutorial on optimizing and deploying a BERT model for low-latency inference using AWS Inferentia2 accelerators and Amazon SageMaker.

Amazon Sagemaker AWS Inferentia2 Bert Machine Learning Deployment Model Optimization

AWS Inferentia2 Articles