Why Scrapefold?
Every scraping vendor has trade-offs. Scrapefold lets you switch between them with one line — and escalates from free local engines to paid APIs only as far as a site forces it.
Try a new vendor
engines=("firecrawl",)Cascade on block pages
is_suspicious + ladder escalationWhole-site crawl
await crawl_site(root, opts)LLM-ready output
result.markdown always populated16 engines, one interface
Local engines are free and fast; SaaS engines add premium proxies and stealth. The router picks the cheapest tier that works. Ratings: ★★★ excellent · ★★☆ good · ★☆☆ basic.
How to choose
Or skip the decision entirely — call scrape(url) and let the router pick.
- Static blog or documentation siterequests — zero deps, sub-second
- JS-rendered SPA, no anti-botscrapling_fast (free) or Jina Reader (free tier)
- Cloudflare / Datadome / PerimeterXscrapling_stealth (free) → Firecrawl / ScrapingBee (paid)
- Site that emits clean markdown via APIJina Reader — direct markdown, no parsing
- LinkedIn / niche socialApify (LinkedIn) — vendor-managed actors
- Structured fields straight from a pageScraperAPI — AI Parser fills the
jsonslot - IP-geofenced targetsOxylabs — residential pool +
geo_location - Need an MCP server for AI agentsscrapefold-mcp — built-in
Quickstart
Install one extra per vendor, or scrapefold[all] for everything.
import asyncio from scrapefold import scrape, crawl_site, ScrapeOptions async def main(): # Single URL, auto-engine — router picks the cheapest tier that works result = await scrape("https://example.com") print(result.markdown) # always populated print(result.engine) # which engine actually fetched it # Cloudflare-protected site — same call, router auto-escalates result = await scrape( "https://protected.example.com", opts=ScrapeOptions(render_js=True, stealth=True), ) # Whole-site crawl with disk cache crawl = await crawl_site( "https://docs.example.com", opts=ScrapeOptions(max_pages=50, max_depth=3), output="site.md", ) asyncio.run(main())
# CLI $ scrapefold scrape https://example.com $ scrapefold crawl https://docs.example.com --max-pages 50 --output site.md $ scrapefold list-engines
Built by & ecosystem
Scrapefold is built and maintained by Mike Sadofyev (CEO, Datatera.ai) — the scraping engine behind Datatera — alongside a small ecosystem of AI-data tooling. Connect on LinkedIn, X, or GitHub.