What I owned
Building the entire stack: Next.js 16 App Router for the public site (with sitemap, robots, SEO landing pages including a how-to-download-images-from-website hub), Playwright for headless extraction (handles lazy-loaded, srcset, CSS background, and JS-injected images), Tiptap for rich-text DMCA submissions, jszip and archiver for output bundling, cheerio for HTML parsing fallback, Prisma 6 over Postgres for extraction history and DMCA case management.
Constraints
- Capture all image sources: img src, srcset, picture, CSS background, JS-injected
- Respect robots.txt and rate-limit per target host
- DMCA workflow must be defensible — structured submissions, response logging
- Public SEO surface — sitemap, robots, landing pages — for organic acquisition
Process
- 01
Extraction engine
Playwright session that scrolls, waits for lazy load, then walks the DOM for every image source. Cheerio fallback for non-JS pages.
- 02
Format pipeline
Optional WebP/PNG/JPEG conversion via sharp, metadata preservation, file naming.
- 03
Output packaging
ZIP via jszip or streaming archiver for large jobs.
- 04
Public SEO surface
Long-form how-to landing pages (e.g. /how-to-download-images-from-website), sitemap.ts, robots.ts.
- 05
DMCA workflow
Structured Tiptap-based submission form, case tracking, response logging.
Approach
Image extraction looks easy until you account for srcset, picture elements, CSS backgrounds, and JS-injected images. Building on Playwright (rather than just cheerio) was the call that made the product actually complete. Pairing the tool itself with a public SEO surface from day one — proper sitemap, robots, long-form how-to pages — sets up organic acquisition before launch.
Deliverables
- Next.js 16 SaaS frontend with public marketing surface
- Playwright-based extraction engine handling all image sources
- Format conversion and ZIP packaging
- Tools and extract pages for paste-and-go workflows
- DMCA submission and case-management system
- SEO landing pages for organic acquisition
- Prisma 6 + Postgres data layer for jobs and DMCA cases
Outcomes
- Extraction engine catches all image sources including lazy-loaded and JS-injected
- DMCA workflow defensible with full submission and response audit trail
- Public SEO surface in place from day one — sitemap, robots, landing pages
- Format conversion pipeline supports WebP, PNG, JPEG with metadata preserved