Stream API — Hypedata Docs

01Why streaming

Sequential calls cap out at a few thousand URLs per hour per connection. Batch jobs introduce minutes of round-trip latency before you see the first row. The Stream API splits the difference — you stay in one HTTP request but get parallelism, ordering-free delivery, and live progress.

Use Stream API when

You have 100 – 10,000 URLs to fetch, you want results to start landing in your code within seconds, and your process can stay up for the duration.

Use Jobs API when

You have 10,000 + URLs, results don't need to be live (overnight is fine), or your code can't hold a long connection (e.g. serverless with 60-second timeouts).

02Protocol

The Stream API is a single long-lived HTTP POST. The request body is a JSON payload describing the batch and per-URL overrides; the response body is a text/event-stream emitting one event per completed scrape.

Compared to WebSockets, SSE is firewall-friendly, survives most proxies unchanged, and auto-reconnects in browsers. Compared to long-polling, SSE delivers events the instant they're produced server-side — no polling interval to tune.

03Opening a stream

import { Hypedata } from "@hypedata/sdk";

const hd = new Hypedata();

const stream = hd.stream({
  urls: urlList,                // up to 10,000 strings or objects
  render: true,
  proxy_type: "residential",
  country: "us",
  concurrency: 16,           // 1..100
  extract: { name: "string", price: "number" }
});

for await (const ev of stream) {
  switch (ev.type) {
    case "page":    await save(ev.data); break;
    case "error":   console.warn(ev.url, ev.code); break;
    case "progress": console.log(`${ev.done}/${ev.total}`); break;
    case "end":     console.log("done"); break;
  }
}

async with hd.stream(
    urls=url_list,
    render=True,
    proxy_type="residential",
    country="us",
    concurrency=16,
    extract={"name": "string", "price": "number"},
) as stream:
    async for ev in stream:
        if ev.type == "page":
            await save(ev.data)
        elif ev.type == "error":
            log.warning("failed", url=ev.url, code=ev.code)

curl -N -X POST https://api.hypedata.io/v1/stream \
  -H "Authorization: Bearer $HYPEDATA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  --data-binary @batch.json

04Event types

The server emits one of five event types per chunk. The event: line on the wire matches the type field in SDK objects.

page

One successful fetch. Payload is the same envelope as /v1/scrape.

event: page
id: 0042
data: {"url":"https://…","http_status":200,"data":{"name":"Alpha","price":219},"meta":{...}}

error

A URL that exhausted retries. Includes code, http_status (if any), and the original url.

progress

Emitted every 250 ms or every 50 URLs (whichever comes first). Contains done, errored, queued, total, and credits_used.

warning

Non-fatal advisories, e.g. {"code":"low_credits","balance":1234}. Won't terminate the stream.

end

Final event. The connection closes immediately after. Includes summary counts and trace_id.

05Backpressure

If your consumer pauses reading the SSE stream (TCP-level), Hypedata pauses scheduling new fetches against the upstream once the in-flight buffer fills. This means a slow database, a paused debugger, or a flaky downstream API will gracefully throttle the pipeline rather than burn credits.

Use concurrency to cap parallelism per stream — your plan also has a global ceiling enumerated on Rate limits.

06Resuming a dropped stream

SSE includes a built-in resume mechanism. When a connection drops, reconnect with the Last-Event-ID header set to the highest id: you received. Hypedata will skip URLs you've already been told about and continue.

curl -N -X POST https://api.hypedata.io/v1/stream/$STREAM_ID \
  -H "Authorization: Bearer $HYPEDATA_API_KEY" \
  -H "Last-Event-ID: 4271"

Streams remain resumable for 15 minutes after disconnect. After that, completed-but-undelivered results are still retrievable from the dashboard or via the Jobs API using the stream's job_id.

07SDK helpers

All official SDKs expose an async iterator over events. Node and Python additionally include collect() helpers that buffer the entire stream into an array — useful for small batches where you'd rather treat the call as synchronous.

// Node — collect into an array
const results = await hd.stream({ urls, render: true }).collect();

08Limits

URLs per stream: 10,000.
Concurrent connections per workspace: Free 2 · Pro 8 · Scale 32 · Enterprise unlimited.
Concurrency per stream: 100.
Max wall time: 6 hours.
Max payload size per event: 25 MB (same as Scrape API).

Stream API.