Crawl
Request to crawl from a base URL and return a list of discovered URLs with their associated data. You can specify the crawl depth and limit the number of pages to crawl.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The depth of the crawl 1 depth mean only the first level of links will be scraped like https://example.com/page1 and https://example.com/page2
Comma-separated list of URLs to exclude from crawling
Regex pattern to exclude specific URLs (e.g., 'https://.datafuel.dev/blog/.' to exclude blog pages)
Optional schema definition for structured data extraction. Format should follow OpenAI's function calling schema format (https://platform.openai.com/docs/guides/structured-outputs).
Example types:
- string: "type": "string"
- integer: "type": "integer"
- number: "type": "number"
- boolean: "type": "boolean"
- array: "type": "array", "items": {"type": "string"}
- object: "type": "object", "properties": {...}
The maximum number of pages to scrape
Response
The identifier for the scraping job