Weights & Biases LLM-Evaluator Hackathon - Hackathon Judge
Author judges a Weights & Biases hackathon focused on building LLM evaluation tools, discussing key considerations and project highlights.
Author judges a Weights & Biases hackathon focused on building LLM evaluation tools, discussing key considerations and project highlights.
A survey of using LLMs as evaluators (LLM-as-Judge) for assessing AI model outputs, covering techniques, use cases, and critiques.
A technical review of the latest pre-training and post-training methodologies used in state-of-the-art large language models (LLMs) like Qwen 2 and Llama 3.1.
A guide to simplifying LLM evaluation workflows using clear metrics, chain-of-thought, and few-shot prompts, inspired by real-world examples.
Introduces MixEval, a cost-effective LLM benchmark with high correlation to Chatbot Arena, for evaluating open-source language models.
HeavyIQ is an AI-powered English-to-SQL interface from HEAVY.AI, using a fine-tuned LLM to query and visualize massive datasets like flight records.
Explains core prompting fundamentals for effective LLM use, including mental models, role assignment, and practical workflow with examples.
A practical guide sharing lessons learned from a year of building real-world applications with Large Language Models (LLMs).
A tutorial on deploying open-source large language models (LLMs) like Llama 3 using the vLLM framework on Hugging Face Inference Endpoints.
A technical guide on fine-tuning the Llama 3 70B model using PyTorch FSDP and Q-Lora for efficient training on limited GPU hardware.
Explores methods for using and finetuning pretrained large language models, including feature-based approaches and parameter updates.
A technical guide on deploying Meta's Llama 3 70B model on Amazon SageMaker using the Hugging Face LLM DLC and Text Generation Inference.
Argues that the term 'Open Source' is misleading for LLMs and proposes the new term 'PALE LLMs' (Publicly Available, Locally Executable).
Explores the balanced use of AI coding tools like GitHub Copilot, discussing benefits, risks of hallucinations, and best practices for developers.
A technical paper exploring the causes, measurement, and mitigation strategies for hallucinations in Large Language Models (LLMs).
A simple explanation of Retrieval-Augmented Generation (RAG), covering its core components: LLMs, context, and vector databases.
Explores adversarial attacks and jailbreak prompts that can make large language models produce unsafe or undesired outputs, bypassing safety measures.
Explores building enterprise applications using Azure OpenAI and Microsoft's data platform for secure, integrated AI solutions.
A technical guide on deploying Hugging Face's IDEFICS visual language models (9B & 80B parameters) to Amazon SageMaker using the LLM DLC.
A benchmark analysis of deploying Meta's Llama 2 models on Amazon SageMaker using Hugging Face's LLM Inference Container, evaluating cost, latency, and throughput.