hypedata
  • Product
  • How it works
  • Use cases
  • Pricing
  • Developers
Sign in Start free
Hypedata/ Developers/ Scrape API

Scrape API.

The Scrape API fetches any URL on the public web and returns the response — optionally rendered with a real browser, optionally parsed into structured JSON. One endpoint, three dozen knobs, sane defaults for 90% of cases.

Base URLapi.hypedata.io
Versionv2.4
Default timeout60 s
Max payload25 MB
Contents
  1. 01Overview
  2. 02Endpoint
  3. 03Minimal request
  4. 04Request parameters
  5. 05Rendering
  6. 06Proxy & geo
  7. 07Sessions
  8. 08Screenshots
  9. 09Inline extraction
  10. 10Response shape
  11. 11Credit cost
  12. 12Errors

01Overview

The Scrape API is the workhorse endpoint. Give it a URL, optionally describe how you want it fetched, and it returns:

  • The final HTML, after redirects and JS execution if requested.
  • A structured data object if you supplied an extract schema.
  • An optional screenshot (PNG, JPEG, or PDF).
  • An optional markdown rendering for LLM ingestion.
  • Telemetry — latency, proxy used, credits charged, retries, and a trace_id.
REST JSON Synchronous Idempotent on GET ~180 ms p50

02Endpoint

GET https://api.hypedata.io/v1/scrape
POST https://api.hypedata.io/v1/scrape

Use GET when all your parameters fit comfortably in a URL (under 2 KB total) — it's marginally faster and trivially idempotent. Use POST for everything else, especially when you're passing an extract schema, custom headers, a session blob, or a JS pre-script.

03Minimal request

curl https://api.hypedata.io/v1/scrape \
  -H "Authorization: Bearer $HYPEDATA_API_KEY" \
  -G --data-urlencode "url=https://example.com"
const page = await hd.scrape({
  url: "https://example.com"
});
page = hd.scrape(url="https://example.com")
page, err := hd.Scrape(ctx, &hypedata.ScrapeReq{URL: "https://example.com"})

04Request parameters

Core

NameTypeDescription
urlrequired string Absolute URL to fetch. Must be http:// or https://. Maximum 8 KB.
methodoptional enum GET (default) · POST · PUT · PATCH · DELETE · HEAD.
bodyoptional string Request body for non-GET methods. Pair with the appropriate Content-Type in headers.
headersoptional object Map of header name → value to forward to the target. Hop-by-hop and Authorization headers are stripped.
timeoutoptional number Total request budget in milliseconds, 1,000–120,000. Defaults to 60,000.
follow_redirectsoptional boolean Follow up to 10 redirects. Defaults to true.

Rendering

NameTypeDescription
renderoptional boolean Execute JavaScript in a real browser before returning. Costs 5× credits. See Rendering.
wait_untiloptional enum load · domcontentloaded · networkidle · networkidle2. Default networkidle2.
wait_foroptional string CSS selector to wait for before returning. Times out after timeout.
delayoptional number Extra milliseconds to wait after wait_until fires, up to 30,000.
scriptoptional string JavaScript to execute in the page after load (sandboxed, max 5 s wall-clock).
browseroptional enum chromium (default) · firefox · webkit.

Proxy & geo

NameTypeDescription
proxy_typeoptional enum datacenter (default) · residential · mobile. See Proxies guide.
countryoptional string ISO 3166-1 alpha-2 country code (us, fr, jp…). Forces an exit IP in that country.
regionoptional string State or province code (US: ca, ny; FR: idf…). Requires country.
cityoptional string City name (new-york, paris…). Requires country. Residential / mobile only.
asnoptional integer Pin to a specific ASN. Useful for testing how a target sees Comcast vs. AT&T traffic.
stealthoptional boolean Rotate TLS fingerprint (JA3/JA4), canvas, WebGL, and font enumeration. Default true.

Output controls

NameTypeDescription
extractoptional object Plain-English or JSON-Schema extraction spec. Adds a data field to the response. See AI Parser.
screenshotoptional object Returns a screenshot. See Screenshots.
markdownoptional boolean Add a markdown field with the page converted to LLM-friendly Markdown (boilerplate stripped).
htmloptional boolean Include the raw HTML in the response. Default true. Set to false with extract to save bandwidth.
formatoptional enum json (default) — JSON envelope · raw — raw upstream body with original headers.

Operational

NameTypeDescription
sessionoptional string Session ID (any opaque string). Pins proxy IP and cookies for ~10 minutes. See Sessions.
retryoptional object { "on": ["5xx", "block"], "max": 2 }. Server-side retries (free up to max=2).
cacheoptional enum / object "hit" for a free cache lookup · { "ttl": 3600 } to write through. Default "miss".
webhookoptional string URL to POST the response to. If set, the API returns { "status": "queued", "job_id": "…" } immediately. See Webhooks.
metadataoptional object Arbitrary key-value tags (max 16 keys, 256 chars each). Echoed back on the response and stored on the job for filtering in the dashboard.

05Rendering

When render: true, Hypedata routes the request to a Chromium pool, loads the page like a real browser, executes JavaScript, and returns the post-render HTML — exactly what a human would see.

When to use

Turn on rendering only if the data you need isn't in the initial HTML response. Most product pages, SERPs, and CMS-driven sites do have data in the HTML. Test without rendering first; you'll save 5× on credits and ~2 s on latency.

Wait strategies

Picking the right wait_until is the difference between a fast scrape and a flaky one.

ValueBest forTypical timing
domcontentloadedServer-rendered pages where the HTML is the data.~600 ms
loadPages with images / fonts you actually need.~1.2 s
networkidle2 (default)Most SPAs — waits until ≤2 connections for 500 ms.~1.8 s
networkidleHeavy SPAs with chatty telemetry. Last resort — many sites never reach this.~3.5 s+

For SPAs that render after a known element appears, use wait_for instead. It's much more reliable than a fixed delay.

Pre-execution script

The script parameter accepts arbitrary JS that runs after the page loads. Useful for clicking "Show more", scrolling, or filling forms.

{
  "url": "https://news.example.com/articles",
  "render": true,
  "script": "for (let i=0;i<5;i++){window.scrollTo(0,document.body.scrollHeight); await new Promise(r=>setTimeout(r,500));}",
  "wait_until": "networkidle2"
}

06Proxy & geo

By default Hypedata routes through datacenter IPs — fast, cheap, and good enough for most sites. Switch to residential for sites that fingerprint at the network layer (most major retailers, ticketing, classifieds), and mobile for the strictest targets (some banks, telco self-service portals).

Targeting examples

// US-only residential, NYC if available
{ "url": "…", "proxy_type": "residential", "country": "us", "city": "new-york" }

// France via Free Mobile (AS12322)
{ "url": "…", "proxy_type": "mobile", "country": "fr", "asn": 12322 }

See the full Proxies guide for credit cost per tier, success-rate benchmarks, and decision flowchart.

07Sessions

Many flows require multiple requests from the same identity — login, then dashboard, then a detail page. Pass the same session string to all of them and Hypedata will:

  • Pin the proxy to the same exit IP for up to 10 minutes.
  • Persist cookies, localStorage, and sessionStorage.
  • Reuse the same browser fingerprint across requests.
// 1. Log in
hd.scrape({
  url: "https://target.example/login",
  method: "POST",
  body: "email=…&password=…",
  session: "order-history-42"
});

// 2. Use the authenticated session
hd.scrape({
  url: "https://target.example/orders",
  session: "order-history-42"
});

Session IDs are opaque to Hypedata — use any string, but keep it secret-grade for keys you don't want others on your team to "borrow".

08Screenshots

Available only when render: true. Pass an object:

{
  "url": "https://example.com",
  "render": true,
  "screenshot": {
    "format": "png",         // png | jpeg | pdf
    "full_page": true,        // scroll & stitch
    "width": 1440,
    "height": 900,
    "omit_background": false,
    "clip": { "x":0, "y":0, "w":1440, "h":600 }
  }
}

The response gains a screenshot_url pointing to a signed CDN URL valid for 24 hours (configurable in workspace settings, up to 30 days).

09Inline extraction

Pass an extract object and the response will include a typed data field. Two formats are accepted: plain-English ("schema sketch") and JSON Schema. The plain-English form is more concise; JSON Schema gives you strict validation.

{
  "extract": {
    "title": "string · article headline",
    "author": "string · byline name only",
    "published_at": "string · ISO 8601 date",
    "tags": "array of strings",
    "word_count": "integer"
  }
}
{
  "extract": {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "required": ["title", "published_at"],
    "properties": {
      "title":        { "type": "string" },
      "author":       { "type": "string" },
      "published_at": { "type": "string", "format": "date-time" },
      "tags":         { "type": "array", "items": {"type":"string"} },
      "word_count":   { "type": "integer", "minimum": 0 }
    }
  }
}

Full mechanics — confidence scores, citations, multi-pass extraction — live in AI Parser.

10Response shape

status
string
Always "ok" on 2xx. On non-2xx, the body switches to the error envelope.
url
string
The URL you requested, normalized.
final_url
string
URL after all redirects. Differs from url when the target redirected.
http_status
integer
Upstream HTTP status code (e.g. 200, 404).
html
string · nullable
Full HTML. Omitted if you passed html: false.
headers
object
Upstream response headers, lower-cased keys.
cookies
array
Cookies set by the target, in Set-Cookie tuple form.
data
object · nullable
Present iff extract was provided. Conforms to your schema.
markdown
string · nullable
Present iff markdown: true. Boilerplate-stripped Markdown.
screenshot_url
string · nullable
Signed URL to the screenshot if requested. Valid 24 h by default.
meta.latency_ms
integer
End-to-end latency including proxy hop and rendering.
meta.proxy
string
Slug of the proxy pool used (e.g. residential.us-east).
meta.render
boolean
Whether rendering was actually performed (it can be skipped on cache hits).
meta.credits_used
integer
Credits charged for this request. See Credit cost.
meta.trace_id
string
Opaque identifier. Include it in any support ticket.

Example — extracted product

200OK · render · residential.us · 6 credits
{
  "status": "ok",
  "url": "https://shop.example.com/p/alpha-coat",
  "final_url": "https://shop.example.com/p/alpha-coat",
  "http_status": 200,
  "data": {
    "name": "Alpha Coat — Tobacco",
    "price": 219.00,
    "currency": "EUR",
    "in_stock": true,
    "images": ["https://cdn.example.com/.../1.jpg", "…"]
  },
  "meta": {
    "latency_ms": 1812,
    "proxy": "residential.us-east",
    "render": true,
    "credits_used": 6,
    "trace_id": "trc_3F2D1A77B0E1"
  }
}

11Credit cost

Every Scrape API request is billed by adding up modifiers:

ComponentCost
Base fetch (datacenter, no render)1 credit
Residential proxy+1
Mobile proxy+4
JavaScript rendering+4
AI extraction (extract)+5
Screenshot+1
Markdown conversion+0 — free
Cache hit (cache: "hit")0 credits
HTTP error before render0 credits — refunded

So a residential + render + extract call costs 1 + 1 + 4 + 5 = 11 credits. You can preview the cost without making the request: POST /v1/scrape/estimate returns the same envelope shape but the data field is null.

12Errors

Common Scrape-specific errors. The full catalog lives in Errors & status codes.

HTTPCodeMeaning
400invalid_urlNot a valid http(s):// URL.
400conflict_paramsYou combined mutually exclusive params (e.g. screenshot without render).
422schema_invalidYour extract object is not valid JSON Schema or plain-English sketch.
429rate_limitedYou hit your plan's per-second cap. Retry after the Retry-After header.
451aup_violationTarget is on our block-list (see Acceptable Use).
504upstream_timeoutThe target didn't respond within timeout. Free retry.
522blockedAnti-bot wall after exhausting retries. Try residential proxy or stealth.
← Previous Authentication Next → SERP API
hypedata. SHERIDAN, WY · EST. 2024
HYPELABS, LLC · v2.4.0
hypedata

Production-grade web data infrastructure. Operated by HypeLabs, LLC under the laws of Wyoming, USA.

All systems operational

Product

  • Scrape API
  • SERP API
  • Stream API
  • AI Parser
  • Pricing

Developers

  • Documentation
  • SDKs
  • API reference
  • Quickstart
  • Status page

Company

  • About
  • Customers
  • Blog
  • Careers
  • Press kit

Legal

  • Terms
  • Privacy
  • DPA
  • Acceptable use
  • Security
© 2026 HYPELABS, LLC · EIN 35-2851293 · SHERIDAN, WY
Twitter / XGitHubLinkedIn