Crawl
Request to crawl from a base URL and return a list of discovered URLs with their associated data. You can specify the crawl depth and limit the number of pages to crawl.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Optional schema definition for structured data extraction. Format should follow OpenAI's function calling schema format (https://platform.openai.com/docs/guides/structured-outputs).
Example types:
- string: "type": "string"
- integer: "type": "integer"
- number: "type": "number"
- boolean: "type": "boolean"
- array: "type": "array", "items": {"type": "string"}
- object: "type": "object", "properties": {...}
{
"description": "Schema for capturing product information",
"name": "Product Schema",
"schema": {
"properties": {
"product_url": {
"description": "The URL of the specific product",
"type": "string"
},
"product_name": {
"description": "The name of the specific product",
"type": "string"
},
"price": {
"description": "The price of the product",
"type": "number"
},
"product_images": {
"description": "List of product image URLs",
"items": {
"properties": {
"url": {
"description": "URL of the product image",
"type": "string"
}
},
"required": ["url"],
"type": "object"
},
"type": "array"
}
},
"required": [
"product_url",
"product_name",
"price",
"product_images"
],
"type": "object"
}
}
The depth of the crawl 1 depth mean only the first level of links will be scraped like https://example.com/page1 and https://example.com/page2
1
The maximum number of pages to scrape
1
Regex pattern to exclude specific URLs (e.g., 'https://.datafuel.dev/blog/.' to exclude blog pages)
"https://.*datafuel\\.dev/blog/.*"
Comma-separated list of URLs to exclude from crawling
"https://www.datafuel.dev/pricing,https://www.datafuel.dev/blog"
Response
The identifier for the scraping job
"f47ac10b-58cc-4372-a567-0e02b2c3d479"