Submit Blog

Sign up Sign in

Search Articles

Filter by Tag

Sort By

Popular Tags

Web scraping Articles

Page 1 of 4 (69 articles)

I'm swearing off APIs entirely

1/26/2026 • EN

I'm swearing off APIs entirely

A developer explains why they are giving up on building apps that rely on external APIs due to access issues, ethical concerns, and platform risks.

API Access Data Dependency oauth side projects web scraping

shot-scraper 1.9

12/29/2025 • EN

shot-scraper 1.9

shot-scraper 1.9 CLI tool released, featuring a new -x option to extract page resources and accessibility command fixes.

cli Digital Forensics Playwright Shot Scraper web scraping

Millions of Locations for Thousands of Brands

12/8/2025 • EN

Millions of Locations for Thousands of Brands

Analyzing All The Places' open-source location data project, detailing the technical setup and process for downloading and examining millions of brand locations.

data analysis Duckdb github Python web scraping

Mark Litwintschik

Inside Claude Code's Web Tools: WebFetch vs WebSearch

10/6/2025 • EN

Inside Claude Code's Web Tools: WebFetch vs WebSearch

A technical analysis of Claude Code's WebFetch and WebSearch tools, detailing their internal architecture and processing pipelines.

api design Claude Code LLM Agents web scraping Web Tools

Mikhail Shilkov

Get xkcd Cartoons at 2x Resolution

9/27/2025 • EN

Get xkcd Cartoons at 2x Resolution

Discover an undocumented trick to get xkcd comics at double resolution using a simple URL modification and a Python script to check availability.

Image Resolution Python srcset URL Manipulation web scraping

“We’re Walling Off The Open Internet To Stop AI”

9/13/2025 • EN

“We’re Walling Off The Open Internet To Stop AI”

Discusses the trend of websites walling off content from AI bots, arguing it undermines open internet principles and may concentrate power.

ai ethics Content Protection Open Internet Tech Policy web scraping

goHardDrive Leaked Personal Data for Thousands of Customers

7/2/2025 • EN

goHardDrive Leaked Personal Data for Thousands of Customers

A security researcher discovers goHardDrive exposed thousands of customer records via an insecure RMA status check form with no authentication.

API Security Data Breach Information Disclosure privacy web scraping

Poisoning Well

3/31/2025 • EN

Poisoning Well

Explores the ethics of LLM training data and proposes a technical method to poison AI crawlers using nofollow links.

ai ethics Data Poisoning llm robots.txt web scraping

Heydon Pickering

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

3/26/2025 • EN

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.

Data Cleaning data processing github recommendation systems web scraping

Please stop externalizing your costs directly into my face

3/17/2025 • EN

Please stop externalizing your costs directly into my face

A developer's frustration with aggressive LLM crawlers causing outages and consuming resources, detailing past abuse like crypto mining and Go module mirror issues.

Abuse Mitigation GIT Hosting LLM Crawlers robots.txt web scraping

What is an LLMs.txt File?

2/28/2025 • EN

What is an LLMs.txt File?

Explains the LLMs.txt file, a new standard for providing context and metadata to Large Language Models to improve accuracy and reduce hallucinations.

AI Agents context llm metadata web scraping

Testing browser-use, a scriptable AI browser agent

2/5/2025 • EN

Testing browser-use, a scriptable AI browser agent

A guide to using browser-use, a scriptable AI agent built with Playwright and LLMs to automate repetitive browser tasks.

AI Agent automation Langchain Playwright web scraping

Using Bing Search to ground LLM responses

1/19/2025 • EN

Using Bing Search to ground LLM responses

Explores using Bing Search API to ground LLM responses for website assistants, comparing custom implementation with Azure AI Agent Service.

API Integration Azure AI Search Bing Search LLM Grounding web scraping

Creating fancy interactive tables using Internet data with rvest and reactable

9/1/2024 • EN

Creating fancy interactive tables using Internet data with rvest and reactable

A technical tutorial on creating interactive data tables by web-scraping with R's rvest package and styling with reactable.

data visualization R Programming Reactable Rvest web scraping

If your website uses Cloudflare, you can now easily block AI bots

7/6/2024 • EN

If your website uses Cloudflare, you can now easily block AI bots

Cloudflare now offers a simple setting to block AI bots from scraping your website, available even on free plans.

AI Bots Bot Management cloudflare security web scraping

We need an evolved robots.txt and regulations to enforce it

6/22/2024 • EN

We need an evolved robots.txt and regulations to enforce it

Argues for an evolved robots.txt standard with AI-specific rules and regulations to enforce them, citing Perplexity AI's violations.

ai ethics Data Privacy Regulations robots.txt web scraping

Installing Playwright on Heroku for Programmatic Node.js Browser Automation

5/31/2024 • EN

Installing Playwright on Heroku for Programmatic Node.js Browser Automation

A guide to installing and configuring Playwright for browser automation on Heroku using Node.js, including dependency management and code structure.

Browser Automation Heroku Node.js Playwright web scraping

Fun With Scrapy Link Validation on CI

1/6/2024 • EN

Fun With Scrapy Link Validation on CI

How to automatically check internal links on a static site using Scrapy and GitHub Actions for continuous integration.

continuous integration Github Actions Link Validation Scrapy web scraping

A deluge of data

8/22/2023 • EN

A deluge of data

A technical analysis of UK rainfall data, covering data scraping, visualization, and processing using Python and APIs.

api data visualization Met Office Rainfall Data web scraping

Popular Airline Passenger Routes Refresh

7/19/2023 • EN

Popular Airline Passenger Routes Refresh

A technical walkthrough of scraping and visualizing global airline passenger route data using Python, DuckDB, and QGIS.

data visualization Duckdb Python web scraping 깃

Mark Litwintschik

1 2 3 4 Next