Web Scraping API for
AI Agents
Crawl, extract, and transform web data with a powerful API. Built for LLMs, RAG pipelines, and AI applications.
USE CASES
From web scraping to AI extraction, one API for all your data needs
Web Scraping
Convert any webpage to clean Markdown or HTML. Perfect for building knowledge bases, training data, and content aggregation.
Site Crawling
Crawl entire websites with depth control and path filtering. Ideal for indexing documentation, blogs, and e-commerce catalogs.
URL Discovery
Map all URLs on a website via sitemap parsing or link extraction. Great for SEO audits and competitive analysis.
AI Extraction
Extract structured data using LLMs with custom prompts and JSON schemas. Build product catalogs, contact lists, and more.
Web Search
Search the web and optionally scrape results. Power your RAG pipelines with fresh, relevant content from across the internet.
Screenshots
Capture webpage screenshots in PNG, JPEG, or WebP. Support for full-page, viewport, and custom dimensions.
SIMPLE API
Powerful APIs for scraping, crawling, and extracting web data.
# Scrape a webpage to Markdown
curl -X POST https://api.aiget.dev/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown", "metadata"],
"onlyMainContent": true
}'
# Response:
{
"success": true,
"markdown": "# Example Domain\n\nThis domain is for...",
"metadata": {
"title": "Example Domain",
"description": "Example Domain for illustrative examples"
}
}FEATURES
Everything you need for production-grade web scraping
HTML to Markdown
Convert HTML to clean GitHub-flavored Markdown with code blocks preserved.
Content Extraction
Extract main content using Readability, filtering ads and navigation.
Metadata Parsing
Extract Open Graph, Twitter Cards, favicon, and structured data.
Page Actions
Click, scroll, type, and wait before scraping for dynamic content.
Smart Caching
Configurable caching with SHA-256 request hashing for fast repeats.
Async Processing
BullMQ-powered queue with retries and exponential backoff.
Webhook Callbacks
Get notified when crawl and batch jobs complete via webhooks.
SSRF Protection
Built-in URL validation prevents internal network access.
PRICING
Simple, transparent pricing that scales with you