Blog

Thoughts on web scraping, automation, data engineering, and building production-grade systems.

Building Scalable Web Scrapers with Playwright and Scrapy

Learn how to combine Playwright's browser automation with Scrapy's powerful crawling framework for enterprise-grade data extraction.

A deep dive into building robust OCR systems using Tesseract, Google Cloud Vision, and AWS Textract with practical optimization tips.

Understanding rate limiting, CAPTCHAs, and anti-bot systems while maintaining ethical scraping practices.

Master the art of transforming messy scraped data into clean, analysis-ready datasets using Python and Pandas.

Techniques for discovering and utilizing hidden APIs in web and mobile applications for efficient data extraction.

A comprehensive comparison of the two leading browser automation tools for web scraping and testing.