POST
/
scrape
curl --request POST \
  --url https://api.datafuel.dev/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "ai_prompt": "<string>",
  "json_schema": {
    "description": "Schema for capturing product information",
    "name": "Product Schema",
    "schema": {
      "properties": {
        "product_url": {
          "description": "The URL of the specific product",
          "type": "string"
        },
        "product_name": {
          "description": "The name of the specific product",
          "type": "string"
        },
        "price": {
          "description": "The price of the product",
          "type": "number"
        },
        "product_images": {
          "description": "List of product image URLs",
          "items": {
            "properties": {
              "url": {
                "description": "URL of the product image",
                "type": "string"
              }
            },
            "required": [
              "url"
            ],
            "type": "object"
          },
          "type": "array"
        }
      },
      "required": [
        "product_url",
        "product_name",
        "price",
        "product_images"
      ],
      "type": "object"
    }
  },
  "javascript_scenario": [
    {}
  ]
}'
{
  "job_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

The scrape endpoint allows you to extract data from a single URL at a time. You can choose between two scraping modes:

  1. Basic Scraping: Extracts data from the provided URL without AI assistance.
  2. AI-Enhanced Scraping: Uses AI to process the scraped content with either:
    • A custom prompt to guide the extraction
    • A prompt combined with a JSON schema for structured and consistent output

Use Cases

  • Extract specific content from web pages
  • Transform unstructured web content into structured data
  • Ensure consistent data format using JSON schema validation

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string
required
ai_prompt
string | null
json_schema
object | null

Optional schema definition for structured data extraction. Format should follow OpenAI's function calling schema format (https://platform.openai.com/docs/guides/structured-outputs).

Example types:

  • string: "type": "string"
  • integer: "type": "integer"
  • number: "type": "number"
  • boolean: "type": "boolean"
  • array: "type": "array", "items": {"type": "string"}
  • object: "type": "object", "properties": {...}
Example:
{
  "description": "Schema for capturing product information",
  "name": "Product Schema",
  "schema": {
    "properties": {
      "product_url": {
        "description": "The URL of the specific product",
        "type": "string"
      },
      "product_name": {
        "description": "The name of the specific product",
        "type": "string"
      },
      "price": {
        "description": "The price of the product",
        "type": "number"
      },
      "product_images": {
        "description": "List of product image URLs",
        "items": {
          "properties": {
            "url": {
              "description": "URL of the product image",
              "type": "string"
            }
          },
          "required": ["url"],
          "type": "object"
        },
        "type": "array"
      }
    },
    "required": [
      "product_url",
      "product_name",
      "price",
      "product_images"
    ],
    "type": "object"
  }
}
javascript_scenario
object[] | null

Response

200
application/json
Successful Response
job_id
string
required

The identifier for the scraping job

Example:

"f47ac10b-58cc-4372-a567-0e02b2c3d479"