Scrape API — HypeData Docs

01Overview

The Scrape API is the workhorse endpoint. Give it a URL, optionally describe how you want it fetched, and it returns:

The final HTML, after redirects and JS execution if requested.
A structured data object if you supplied an extract schema.
An optional screenshot (PNG, JPEG, or PDF).
An optional markdown rendering for LLM ingestion.
Telemetry — latency, proxy used, credits charged, retries, and a trace_id.

REST JSON Synchronous Idempotent on GET ~180 ms p50

02Endpoint

GET https://api.hypedata.io/v1/scrape

POST https://api.hypedata.io/v1/scrape

Use GET when all your parameters fit comfortably in a URL (under 2 KB total) — it's marginally faster and trivially idempotent. Use POST for everything else, especially when you're passing an extract schema, custom headers, a session blob, or a JS pre-script.

03Minimal request

curl https://api.hypedata.io/v1/scrape \
  -H "Authorization: Bearer $HYPEDATA_API_KEY" \
  -G --data-urlencode "url=https://example.com"

const page = await hd.scrape({
  url: "https://example.com"
});

page = hd.scrape(url="https://example.com")

page, err := hd.Scrape(ctx, &hypedata.ScrapeReq{URL: "https://example.com"})

04Request parameters

Core

Name	Type	Description
urlrequired	string	Absolute URL to fetch. Must be `http://` or `https://`. Maximum 8 KB.
methodoptional	enum	`GET` (default) · `POST` · `PUT` · `PATCH` · `DELETE` · `HEAD`.
bodyoptional	string	Request body for non-GET methods. Pair with the appropriate `Content-Type` in `headers`.
headersoptional	object	Map of header name → value to forward to the target. Hop-by-hop and `Authorization` headers are stripped.
timeoutoptional	number	Total request budget in milliseconds, 1,000–120,000. Defaults to 60,000.
follow_redirectsoptional	boolean	Follow up to 10 redirects. Defaults to `true`.

Rendering

Name	Type	Description
renderoptional	boolean	Execute JavaScript in a real browser before returning. Costs 5× credits. See Rendering.
wait_untiloptional	enum	`load` · `domcontentloaded` · `networkidle` · `networkidle2`. Default `networkidle2`.
wait_foroptional	string	CSS selector to wait for before returning. Times out after `timeout`.
delayoptional	number	Extra milliseconds to wait after `wait_until` fires, up to 30,000.
scriptoptional	string	JavaScript to execute in the page after load (sandboxed, max 5 s wall-clock).
browseroptional	enum	`chromium` (default) · `firefox` · `webkit`.

Proxy & geo

Name	Type	Description
proxy_typeoptional	enum	`datacenter` (default) · `residential` · `mobile`. See Proxies guide.
countryoptional	string	ISO 3166-1 alpha-2 country code (`us`, `fr`, `jp`…). Forces an exit IP in that country.
regionoptional	string	State or province code (US: `ca`, `ny`; FR: `idf`…). Requires `country`.
cityoptional	string	City name (`new-york`, `paris`…). Requires `country`. Residential / mobile only.
asnoptional	integer	Pin to a specific ASN. Useful for testing how a target sees Comcast vs. AT&T traffic.
stealthoptional	boolean	Rotate TLS fingerprint (JA3/JA4), canvas, WebGL, and font enumeration. Default `true`.

Output controls

Name	Type	Description
extractoptional	object	Plain-English or JSON-Schema extraction spec. Adds a `data` field to the response. See AI Parser.
screenshotoptional	object	Returns a screenshot. See Screenshots.
markdownoptional	boolean	Add a `markdown` field with the page converted to LLM-friendly Markdown (boilerplate stripped).
htmloptional	boolean	Include the raw HTML in the response. Default `true`. Set to `false` with `extract` to save bandwidth.
formatoptional	enum	`json` (default) — JSON envelope · `raw` — raw upstream body with original headers.

Operational

Name	Type	Description
sessionoptional	string	Session ID (any opaque string). Pins proxy IP and cookies for ~10 minutes. See Sessions.
retryoptional	object	`{ "on": ["5xx", "block"], "max": 2 }`. Server-side retries (free up to `max=2`).
cacheoptional	enum / object	`"hit"` for a free cache lookup · `{ "ttl": 3600 }` to write through. Default `"miss"`.
webhookoptional	string	URL to POST the response to. If set, the API returns `{ "status": "queued", "job_id": "…" }` immediately. See Webhooks.
metadataoptional	object	Arbitrary key-value tags (max 16 keys, 256 chars each). Echoed back on the response and stored on the job for filtering in the dashboard.

05Rendering

When render: true, HypeData routes the request to a Chromium pool, loads the page like a real browser, executes JavaScript, and returns the post-render HTML — exactly what a human would see.

When to use

Turn on rendering only if the data you need isn't in the initial HTML response. Most product pages, SERPs, and CMS-driven sites do have data in the HTML. Test without rendering first; you'll save 5× on credits and ~2 s on latency.

Wait strategies

Picking the right wait_until is the difference between a fast scrape and a flaky one.

Value	Best for	Typical timing
`domcontentloaded`	Server-rendered pages where the HTML is the data.	~600 ms
`load`	Pages with images / fonts you actually need.	~1.2 s
`networkidle2` (default)	Most SPAs — waits until ≤2 connections for 500 ms.	~1.8 s
`networkidle`	Heavy SPAs with chatty telemetry. Last resort — many sites never reach this.	~3.5 s+

For SPAs that render after a known element appears, use wait_for instead. It's much more reliable than a fixed delay.

Pre-execution script

The script parameter accepts arbitrary JS that runs after the page loads. Useful for clicking "Show more", scrolling, or filling forms.

{
  "url": "https://news.example.com/articles",
  "render": true,
  "script": "for (let i=0;i<5;i++){window.scrollTo(0,document.body.scrollHeight); await new Promise(r=>setTimeout(r,500));}",
  "wait_until": "networkidle2"
}

06Proxy & geo

By default HypeData routes through datacenter IPs — fast, cheap, and good enough for most sites. Switch to residential for sites that fingerprint at the network layer (most major retailers, ticketing, classifieds), and mobile for the strictest targets (some banks, telco self-service portals).

Targeting examples

// US-only residential, NYC if available
{ "url": "…", "proxy_type": "residential", "country": "us", "city": "new-york" }

// France via Free Mobile (AS12322)
{ "url": "…", "proxy_type": "mobile", "country": "fr", "asn": 12322 }

See the full Proxies guide for credit cost per tier, success-rate benchmarks, and decision flowchart.

07Sessions

Many flows require multiple requests from the same identity — login, then dashboard, then a detail page. Pass the same session string to all of them and HypeData will:

Pin the proxy to the same exit IP for up to 10 minutes.
Persist cookies, localStorage, and sessionStorage.
Reuse the same browser fingerprint across requests.

// 1. Log in
hd.scrape({
  url: "https://target.example/login",
  method: "POST",
  body: "email=…&password=…",
  session: "order-history-42"
});

// 2. Use the authenticated session
hd.scrape({
  url: "https://target.example/orders",
  session: "order-history-42"
});

Session IDs are opaque to HypeData — use any string, but keep it secret-grade for keys you don't want others on your team to "borrow".

08Screenshots

Available only when render: true. Pass an object:

{
  "url": "https://example.com",
  "render": true,
  "screenshot": {
    "format": "png",         // png | jpeg | pdf
    "full_page": true,        // scroll & stitch
    "width": 1440,
    "height": 900,
    "omit_background": false,
    "clip": { "x":0, "y":0, "w":1440, "h":600 }
  }
}

The response gains a screenshot_url pointing to a signed CDN URL valid for 24 hours (configurable in workspace settings, up to 30 days).

09Inline extraction

Pass an extract object and the response will include a typed data field. Two formats are accepted: plain-English ("schema sketch") and JSON Schema. The plain-English form is more concise; JSON Schema gives you strict validation.

{
  "extract": {
    "title": "string · article headline",
    "author": "string · byline name only",
    "published_at": "string · ISO 8601 date",
    "tags": "array of strings",
    "word_count": "integer"
  }
}

{
  "extract": {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "required": ["title", "published_at"],
    "properties": {
      "title":        { "type": "string" },
      "author":       { "type": "string" },
      "published_at": { "type": "string", "format": "date-time" },
      "tags":         { "type": "array", "items": {"type":"string"} },
      "word_count":   { "type": "integer", "minimum": 0 }
    }
  }
}

Full mechanics — confidence scores, citations, multi-pass extraction — live in AI Parser.

10Response shape

status

string

Always "ok" on 2xx. On non-2xx, the body switches to the error envelope.

url

string

The URL you requested, normalized.

final_url

string

URL after all redirects. Differs from url when the target redirected.

http_status

integer

Upstream HTTP status code (e.g. 200, 404).

html

string · nullable

Full HTML. Omitted if you passed html: false.

headers

object

Upstream response headers, lower-cased keys.

array

Cookies set by the target, in Set-Cookie tuple form.

data

object · nullable

Present iff extract was provided. Conforms to your schema.

markdown

string · nullable

Present iff markdown: true. Boilerplate-stripped Markdown.

screenshot_url

string · nullable

Signed URL to the screenshot if requested. Valid 24 h by default.

meta.latency_ms

integer

End-to-end latency including proxy hop and rendering.

meta.proxy

string

Slug of the proxy pool used (e.g. residential.us-east).

meta.render

boolean

Whether rendering was actually performed (it can be skipped on cache hits).

meta.credits_used

integer

Credits charged for this request. See Credit cost.

meta.trace_id

string

Opaque identifier. Include it in any support ticket.

Example — extracted product

200OK · render · residential.us · 6 credits

{
  "status": "ok",
  "url": "https://shop.example.com/p/alpha-coat",
  "final_url": "https://shop.example.com/p/alpha-coat",
  "http_status": 200,
  "data": {
    "name": "Alpha Coat — Tobacco",
    "price": 219.00,
    "currency": "EUR",
    "in_stock": true,
    "images": ["https://cdn.example.com/.../1.jpg", "…"]
  },
  "meta": {
    "latency_ms": 1812,
    "proxy": "residential.us-east",
    "render": true,
    "credits_used": 6,
    "trace_id": "trc_3F2D1A77B0E1"
  }
}

11Credit cost

Every Scrape API request is billed by adding up modifiers:

Component	Cost
Base fetch (datacenter, no render)	1 credit
Residential proxy	+1
Mobile proxy	+4
JavaScript rendering	+4
AI extraction (`extract`)	+5
Screenshot	+1
Markdown conversion	+0 — free
Cache hit (`cache: "hit"`)	0 credits
HTTP error before render	0 credits — refunded

So a residential + render + extract call costs 1 + 1 + 4 + 5 = 11 credits. You can preview the cost without making the request: POST /v1/scrape/estimate returns the same envelope shape but the data field is null.

12Errors

Common Scrape-specific errors. The full catalog lives in Errors & status codes.

HTTP	Code	Meaning
400	`invalid_url`	Not a valid `http(s)://` URL.
400	`conflict_params`	You combined mutually exclusive params (e.g. `screenshot` without `render`).
422	`schema_invalid`	Your `extract` object is not valid JSON Schema or plain-English sketch.
429	`rate_limited`	You hit your plan's per-second cap. Retry after the `Retry-After` header.
451	`aup_violation`	Target is on our block-list (see Acceptable Use).
504	`upstream_timeout`	The target didn't respond within `timeout`. Free retry.
522	`blocked`	Anti-bot wall after exhausting retries. Try residential proxy or stealth.

Scrape API.

01Overview

02Endpoint

03Minimal request

04Request parameters

Core

Rendering

Proxy & geo

Output controls

Operational

05Rendering

Wait strategies

Pre-execution script

06Proxy & geo

Targeting examples

07Sessions

08Screenshots

09Inline extraction

10Response shape

Example — extracted product

11Credit cost

12Errors