AGI, ASI, A*I – Do we have all we need to get there?
Leading AI researchers debate whether current scaling and innovations are sufficient to achieve Artificial General Intelligence (AGI).
Leading AI researchers debate whether current scaling and innovations are sufficient to achieve Artificial General Intelligence (AGI).
A visual essay explaining LLM internals like tokenization, embeddings, and transformer architecture in an accessible way.
Analysis of OpenAI's new gpt-oss models, comparing architectural improvements from GPT-2 and examining optimizations like MXFP4 and Mixture-of-Experts.
A technical deep dive into DeepSeek's Multi-Head Latent Attention mechanism, covering its mathematics and implementation in Julia.
A 3-hour coding workshop video covering the implementation, training, and use of Large Language Models (LLMs) from scratch.
Announcing the release of the 'transformer' R package on CRAN, implementing a full transformer architecture for AI/ML development.
An updated, comprehensive overview of the Transformer architecture and its many recent improvements, including detailed notation and attention mechanisms.
Explains how retrieval-augmented language models like RETRO achieve GPT-3 performance with far fewer parameters by querying external knowledge.