Simon Willison 3/19/2026

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Read Original

The article details an experiment by Dan Woods to run the Qwen3.5-397B-A17B MoE model on a 48GB MacBook Pro M3 Max. It leverages Apple's 'LLM in a Flash' paper to stream model weights from SSD, using AI-assisted coding (Claude) for optimization. The project involves quantization and expert routing to achieve usable inference speeds, discussing trade-offs between model size, speed, and output quality.

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet