NLP for Supervised Learning - A Brief Survey
A chronological survey of key NLP models and techniques for supervised learning, from early RNNs to modern transformers like BERT and T5.
Eugene Yan is a Principal Applied Scientist at Amazon, building AI-powered recommendation systems and experiences. He shares insights on RecSys, LLMs, and applied machine learning, while mentoring and investing in ML startups.
185 articles from this blog
A chronological survey of key NLP models and techniques for supervised learning, from early RNNs to modern transformers like BERT and T5.
Argues that data scientists should own the entire process from problem identification to solution deployment for greater impact and efficiency.
A tutorial on extending a FastAPI web app with HTML forms to add checkbox functionality and a file download button.
A graduate's detailed FAQ about Georgia Tech's Online Master's in Computer Science (OMSCS), covering costs, admissions, courses, and career impact.
A tutorial on building a web application with HTML forms and templates using the FastAPI framework and Jinja templating engine.
Explains the importance of post-project follow-up in data science, focusing on code cleanup, Jupyter notebook version control issues, and documentation.
A data scientist shares practical habits and workflows for executing successful data science projects, focusing on research, experimentation, and team alignment.
A tutorial on automating GitHub profile README updates using Python and GitHub Actions to display recent blog posts.
Notes from Spark+AI Summit 2020 covering application-specific talks on ML frameworks, data engineering, feature stores, and data quality from companies like Airbnb and Netflix.
Summary of key application-agnostic talks from Spark+AI Summit 2020, focusing on scaling and optimizing deep learning models.
Answers common questions about data science in business, covering requirements, model interpretability, web scraping, and team roles.
A guide to setting up a Python project with automated testing, linting, and type-checking to improve code quality and team collaboration.
Explains why Apache Airflow jobs appear to run a day late due to its scheduling logic, contrasting it with cron jobs.
A data scientist shares three essential pre-project tasks—the one-pager, time-box, and breakdown—to avoid common pitfalls and ensure project success.
A data scientist shares how adopting Scrum, despite initial resistance, improved project management and delivery for data science teams.
A guide to best practices for monitoring, maintaining, and managing machine learning models and data pipelines in a production environment.
Explores six unexpected challenges that arise after deploying machine learning models in production, from data schema changes to organizational issues.
Summarizes key writing advice from David Perell and Sahil Lavingia, emphasizing its importance for data scientists and tech professionals.
A data scientist analyzes why a simple 'wish list notification' feature won a major hackathon over more complex, high-tech ideas.
Explores the importance of serendipity over just accuracy in recommendation systems, discussing metrics, user engagement, and business benefits.