Simon Willison 3/19/2026

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Read Original

Article details an experiment by Dan Woods to run the Qwen3.5-397B-A17B MoE model on a 48GB MacBook Pro M3 Max. It leverages Apple's 'LLM in a Flash' paper for efficient inference from flash storage, using AI-assisted coding (Claude) for optimization, quantization, and performance tuning to achieve several tokens per second.

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

No top articles yet