01Overview
The Scrape API is the workhorse endpoint. Give it a URL, optionally describe how you want it fetched, and it returns:
- The final HTML, after redirects and JS execution if requested.
- A structured
dataobject if you supplied anextractschema. - An optional
screenshot(PNG, JPEG, or PDF). - An optional
markdownrendering for LLM ingestion. - Telemetry — latency, proxy used, credits charged, retries, and a
trace_id.
02Endpoint
Use GET when all your parameters fit comfortably in a URL (under 2 KB total) — it's marginally faster and trivially idempotent. Use POST for everything else, especially when you're passing an extract schema, custom headers, a session blob, or a JS pre-script.
03Minimal request
curl https://api.hypedata.io/v1/scrape \ -H "Authorization: Bearer $HYPEDATA_API_KEY" \ -G --data-urlencode "url=https://example.com"
const page = await hd.scrape({ url: "https://example.com" });
page = hd.scrape(url="https://example.com")
page, err := hd.Scrape(ctx, &hypedata.ScrapeReq{URL: "https://example.com"})
04Request parameters
Core
| Name | Type | Description |
|---|---|---|
| urlrequired | string | Absolute URL to fetch. Must be http:// or https://. Maximum 8 KB. |
| methodoptional | enum | GET (default) · POST · PUT · PATCH · DELETE · HEAD. |
| bodyoptional | string | Request body for non-GET methods. Pair with the appropriate Content-Type in headers. |
| headersoptional | object | Map of header name → value to forward to the target. Hop-by-hop and Authorization headers are stripped. |
| timeoutoptional | number | Total request budget in milliseconds, 1,000–120,000. Defaults to 60,000. |
| follow_redirectsoptional | boolean | Follow up to 10 redirects. Defaults to true. |
Rendering
| Name | Type | Description |
|---|---|---|
| renderoptional | boolean | Execute JavaScript in a real browser before returning. Costs 5× credits. See Rendering. |
| wait_untiloptional | enum | load · domcontentloaded · networkidle · networkidle2. Default networkidle2. |
| wait_foroptional | string | CSS selector to wait for before returning. Times out after timeout. |
| delayoptional | number | Extra milliseconds to wait after wait_until fires, up to 30,000. |
| scriptoptional | string | JavaScript to execute in the page after load (sandboxed, max 5 s wall-clock). |
| browseroptional | enum | chromium (default) · firefox · webkit. |
Proxy & geo
| Name | Type | Description |
|---|---|---|
| proxy_typeoptional | enum | datacenter (default) · residential · mobile. See Proxies guide. |
| countryoptional | string | ISO 3166-1 alpha-2 country code (us, fr, jp…). Forces an exit IP in that country. |
| regionoptional | string | State or province code (US: ca, ny; FR: idf…). Requires country. |
| cityoptional | string | City name (new-york, paris…). Requires country. Residential / mobile only. |
| asnoptional | integer | Pin to a specific ASN. Useful for testing how a target sees Comcast vs. AT&T traffic. |
| stealthoptional | boolean | Rotate TLS fingerprint (JA3/JA4), canvas, WebGL, and font enumeration. Default true. |
Output controls
| Name | Type | Description |
|---|---|---|
| extractoptional | object | Plain-English or JSON-Schema extraction spec. Adds a data field to the response. See AI Parser. |
| screenshotoptional | object | Returns a screenshot. See Screenshots. |
| markdownoptional | boolean | Add a markdown field with the page converted to LLM-friendly Markdown (boilerplate stripped). |
| htmloptional | boolean | Include the raw HTML in the response. Default true. Set to false with extract to save bandwidth. |
| formatoptional | enum | json (default) — JSON envelope · raw — raw upstream body with original headers. |
Operational
| Name | Type | Description |
|---|---|---|
| sessionoptional | string | Session ID (any opaque string). Pins proxy IP and cookies for ~10 minutes. See Sessions. |
| retryoptional | object | { "on": ["5xx", "block"], "max": 2 }. Server-side retries (free up to max=2). |
| cacheoptional | enum / object | "hit" for a free cache lookup · { "ttl": 3600 } to write through. Default "miss". |
| webhookoptional | string | URL to POST the response to. If set, the API returns { "status": "queued", "job_id": "…" } immediately. See Webhooks. |
| metadataoptional | object | Arbitrary key-value tags (max 16 keys, 256 chars each). Echoed back on the response and stored on the job for filtering in the dashboard. |
05Rendering
When render: true, Hypedata routes the request to a Chromium pool, loads the page like a real browser, executes JavaScript, and returns the post-render HTML — exactly what a human would see.
Turn on rendering only if the data you need isn't in the initial HTML response. Most product pages, SERPs, and CMS-driven sites do have data in the HTML. Test without rendering first; you'll save 5× on credits and ~2 s on latency.
Wait strategies
Picking the right wait_until is the difference between a fast scrape and a flaky one.
| Value | Best for | Typical timing |
|---|---|---|
domcontentloaded | Server-rendered pages where the HTML is the data. | ~600 ms |
load | Pages with images / fonts you actually need. | ~1.2 s |
networkidle2 (default) | Most SPAs — waits until ≤2 connections for 500 ms. | ~1.8 s |
networkidle | Heavy SPAs with chatty telemetry. Last resort — many sites never reach this. | ~3.5 s+ |
For SPAs that render after a known element appears, use wait_for instead. It's much more reliable than a fixed delay.
Pre-execution script
The script parameter accepts arbitrary JS that runs after the page loads. Useful for clicking "Show more", scrolling, or filling forms.
{
"url": "https://news.example.com/articles",
"render": true,
"script": "for (let i=0;i<5;i++){window.scrollTo(0,document.body.scrollHeight); await new Promise(r=>setTimeout(r,500));}",
"wait_until": "networkidle2"
}06Proxy & geo
By default Hypedata routes through datacenter IPs — fast, cheap, and good enough for most sites. Switch to residential for sites that fingerprint at the network layer (most major retailers, ticketing, classifieds), and mobile for the strictest targets (some banks, telco self-service portals).
Targeting examples
// US-only residential, NYC if available { "url": "…", "proxy_type": "residential", "country": "us", "city": "new-york" } // France via Free Mobile (AS12322) { "url": "…", "proxy_type": "mobile", "country": "fr", "asn": 12322 }
See the full Proxies guide for credit cost per tier, success-rate benchmarks, and decision flowchart.
07Sessions
Many flows require multiple requests from the same identity — login, then dashboard, then a detail page. Pass the same session string to all of them and Hypedata will:
- Pin the proxy to the same exit IP for up to 10 minutes.
- Persist cookies,
localStorage, andsessionStorage. - Reuse the same browser fingerprint across requests.
// 1. Log in hd.scrape({ url: "https://target.example/login", method: "POST", body: "email=…&password=…", session: "order-history-42" }); // 2. Use the authenticated session hd.scrape({ url: "https://target.example/orders", session: "order-history-42" });
Session IDs are opaque to Hypedata — use any string, but keep it secret-grade for keys you don't want others on your team to "borrow".
08Screenshots
Available only when render: true. Pass an object:
{
"url": "https://example.com",
"render": true,
"screenshot": {
"format": "png", // png | jpeg | pdf
"full_page": true, // scroll & stitch
"width": 1440,
"height": 900,
"omit_background": false,
"clip": { "x":0, "y":0, "w":1440, "h":600 }
}
}The response gains a screenshot_url pointing to a signed CDN URL valid for 24 hours (configurable in workspace settings, up to 30 days).
09Inline extraction
Pass an extract object and the response will include a typed data field. Two formats are accepted: plain-English ("schema sketch") and JSON Schema. The plain-English form is more concise; JSON Schema gives you strict validation.
{
"extract": {
"title": "string · article headline",
"author": "string · byline name only",
"published_at": "string · ISO 8601 date",
"tags": "array of strings",
"word_count": "integer"
}
}
{
"extract": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["title", "published_at"],
"properties": {
"title": { "type": "string" },
"author": { "type": "string" },
"published_at": { "type": "string", "format": "date-time" },
"tags": { "type": "array", "items": {"type":"string"} },
"word_count": { "type": "integer", "minimum": 0 }
}
}
}
Full mechanics — confidence scores, citations, multi-pass extraction — live in AI Parser.
10Response shape
"ok" on 2xx. On non-2xx, the body switches to the error envelope.url when the target redirected.200, 404).html: false.Set-Cookie tuple form.extract was provided. Conforms to your schema.markdown: true. Boilerplate-stripped Markdown.residential.us-east).Example — extracted product
{
"status": "ok",
"url": "https://shop.example.com/p/alpha-coat",
"final_url": "https://shop.example.com/p/alpha-coat",
"http_status": 200,
"data": {
"name": "Alpha Coat — Tobacco",
"price": 219.00,
"currency": "EUR",
"in_stock": true,
"images": ["https://cdn.example.com/.../1.jpg", "…"]
},
"meta": {
"latency_ms": 1812,
"proxy": "residential.us-east",
"render": true,
"credits_used": 6,
"trace_id": "trc_3F2D1A77B0E1"
}
}11Credit cost
Every Scrape API request is billed by adding up modifiers:
| Component | Cost |
|---|---|
| Base fetch (datacenter, no render) | 1 credit |
| Residential proxy | +1 |
| Mobile proxy | +4 |
| JavaScript rendering | +4 |
AI extraction (extract) | +5 |
| Screenshot | +1 |
| Markdown conversion | +0 — free |
Cache hit (cache: "hit") | 0 credits |
| HTTP error before render | 0 credits — refunded |
So a residential + render + extract call costs 1 + 1 + 4 + 5 = 11 credits. You can preview the cost without making the request: POST /v1/scrape/estimate returns the same envelope shape but the data field is null.
12Errors
Common Scrape-specific errors. The full catalog lives in Errors & status codes.
| HTTP | Code | Meaning |
|---|---|---|
| 400 | invalid_url | Not a valid http(s):// URL. |
| 400 | conflict_params | You combined mutually exclusive params (e.g. screenshot without render). |
| 422 | schema_invalid | Your extract object is not valid JSON Schema or plain-English sketch. |
| 429 | rate_limited | You hit your plan's per-second cap. Retry after the Retry-After header. |
| 451 | aup_violation | Target is on our block-list (see Acceptable Use). |
| 504 | upstream_timeout | The target didn't respond within timeout. Free retry. |
| 522 | blocked | Anti-bot wall after exhausting retries. Try residential proxy or stealth. |