POST
/
crawl
curl --request POST \
  --url https://api.datafuel.dev/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "ai_prompt": "<string>",
  "json_schema": {
    "description": "Schema for capturing product information",
    "name": "Product Schema",
    "schema": {
      "properties": {
        "product_url": {
          "description": "The URL of the specific product",
          "type": "string"
        },
        "product_name": {
          "description": "The name of the specific product",
          "type": "string"
        },
        "price": {
          "description": "The price of the product",
          "type": "number"
        },
        "product_images": {
          "description": "List of product image URLs",
          "items": {
            "properties": {
              "url": {
                "description": "URL of the product image",
                "type": "string"
              }
            },
            "required": [
              "url"
            ],
            "type": "object"
          },
          "type": "array"
        }
      },
      "required": [
        "product_url",
        "product_name",
        "price",
        "product_images"
      ],
      "type": "object"
    }
  },
  "javascript_scenario": [
    {}
  ],
  "depth": 1,
  "limit": 1,
  "exclusion_pattern": "https://.*datafuel\\.dev/blog/.*",
  "excluded_links": "https://www.datafuel.dev/pricing,https://www.datafuel.dev/blog"
}'
{
  "job_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string
required
ai_prompt
string | null
json_schema
object | null

Optional schema definition for structured data extraction. Format should follow OpenAI's function calling schema format (https://platform.openai.com/docs/guides/structured-outputs).

Example types:

  • string: "type": "string"
  • integer: "type": "integer"
  • number: "type": "number"
  • boolean: "type": "boolean"
  • array: "type": "array", "items": {"type": "string"}
  • object: "type": "object", "properties": {...}
javascript_scenario
object[] | null
depth
integer
default:
1

The depth of the crawl 1 depth mean only the first level of links will be scraped like https://example.com/page1 and https://example.com/page2

limit
integer
default:
1

The maximum number of pages to scrape

exclusion_pattern
string
default:

Regex pattern to exclude specific URLs (e.g., 'https://.datafuel.dev/blog/.' to exclude blog pages)

Comma-separated list of URLs to exclude from crawling

Response

200
application/json
Successful Response
job_id
string
required

The identifier for the scraping job