Scrape

Scrape

curl --request POST \
  --url https://api.datafuel.dev/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "ai_prompt": "<string>",
  "json_schema": {
    "description": "Schema for capturing product information",
    "name": "Product Schema",
    "schema": {
      "properties": {
        "product_url": {
          "description": "The URL of the specific product",
          "type": "string"
        },
        "product_name": {
          "description": "The name of the specific product",
          "type": "string"
        },
        "price": {
          "description": "The price of the product",
          "type": "number"
        },
        "product_images": {
          "description": "List of product image URLs",
          "items": {
            "properties": {
              "url": {
                "description": "URL of the product image",
                "type": "string"
              }
            },
            "required": [
              "url"
            ],
            "type": "object"
          },
          "type": "array"
        }
      },
      "required": [
        "product_url",
        "product_name",
        "price",
        "product_images"
      ],
      "type": "object"
    }
  },
  "javascript_scenario": [
    {}
  ]
}'

{
  "job_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

POST

scrape

Scrape

curl --request POST \
  --url https://api.datafuel.dev/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "ai_prompt": "<string>",
  "json_schema": {
    "description": "Schema for capturing product information",
    "name": "Product Schema",
    "schema": {
      "properties": {
        "product_url": {
          "description": "The URL of the specific product",
          "type": "string"
        },
        "product_name": {
          "description": "The name of the specific product",
          "type": "string"
        },
        "price": {
          "description": "The price of the product",
          "type": "number"
        },
        "product_images": {
          "description": "List of product image URLs",
          "items": {
            "properties": {
              "url": {
                "description": "URL of the product image",
                "type": "string"
              }
            },
            "required": [
              "url"
            ],
            "type": "object"
          },
          "type": "array"
        }
      },
      "required": [
        "product_url",
        "product_name",
        "price",
        "product_images"
      ],
      "type": "object"
    }
  },
  "javascript_scenario": [
    {}
  ]
}'

{
  "job_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

The scrape endpoint allows you to extract data from a single URL at a time. You can choose between two scraping modes:

Basic Scraping: Extracts data from the provided URL without AI assistance.
AI-Enhanced Scraping: Uses AI to process the scraped content with either:
- A custom prompt to guide the extraction
- A prompt combined with a JSON schema for structured and consistent output

Use Cases

Extract specific content from web pages
Transform unstructured web content into structured data
Ensure consistent data format using JSON schema validation

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

application/json

Successful Response

The response is of type object.

Introduction Scrape With Login

Get Started

Endpoints

Use Cases

Authorizations

Body

Response

Get Started

Endpoints

​Use Cases

Authorizations

Body

Response

Use Cases