01Why streaming
Sequential calls cap out at a few thousand URLs per hour per connection. Batch jobs introduce minutes of round-trip latency before you see the first row. The Stream API splits the difference — you stay in one HTTP request but get parallelism, ordering-free delivery, and live progress.
Use Stream API when
You have 100 – 10,000 URLs to fetch, you want results to start landing in your code within seconds, and your process can stay up for the duration.
Use Jobs API when
You have 10,000 + URLs, results don't need to be live (overnight is fine), or your code can't hold a long connection (e.g. serverless with 60-second timeouts).
02Protocol
The Stream API is a single long-lived HTTP POST. The request body is a JSON payload describing the batch and per-URL overrides; the response body is a text/event-stream emitting one event per completed scrape.
Compared to WebSockets, SSE is firewall-friendly, survives most proxies unchanged, and auto-reconnects in browsers. Compared to long-polling, SSE delivers events the instant they're produced server-side — no polling interval to tune.
03Opening a stream
import { Hypedata } from "@hypedata/sdk"; const hd = new Hypedata(); const stream = hd.stream({ urls: urlList, // up to 10,000 strings or objects render: true, proxy_type: "residential", country: "us", concurrency: 16, // 1..100 extract: { name: "string", price: "number" } }); for await (const ev of stream) { switch (ev.type) { case "page": await save(ev.data); break; case "error": console.warn(ev.url, ev.code); break; case "progress": console.log(`${ev.done}/${ev.total}`); break; case "end": console.log("done"); break; } }
async with hd.stream( urls=url_list, render=True, proxy_type="residential", country="us", concurrency=16, extract={"name": "string", "price": "number"}, ) as stream: async for ev in stream: if ev.type == "page": await save(ev.data) elif ev.type == "error": log.warning("failed", url=ev.url, code=ev.code)
curl -N -X POST https://api.hypedata.io/v1/stream \ -H "Authorization: Bearer $HYPEDATA_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ --data-binary @batch.json
04Event types
The server emits one of five event types per chunk. The event: line on the wire matches the type field in SDK objects.
page
One successful fetch. Payload is the same envelope as /v1/scrape.
event: page
id: 0042
data: {"url":"https://…","http_status":200,"data":{"name":"Alpha","price":219},"meta":{...}}error
A URL that exhausted retries. Includes code, http_status (if any), and the original url.
progress
Emitted every 250 ms or every 50 URLs (whichever comes first). Contains done, errored, queued, total, and credits_used.
warning
Non-fatal advisories, e.g. {"code":"low_credits","balance":1234}. Won't terminate the stream.
end
Final event. The connection closes immediately after. Includes summary counts and trace_id.
05Backpressure
If your consumer pauses reading the SSE stream (TCP-level), Hypedata pauses scheduling new fetches against the upstream once the in-flight buffer fills. This means a slow database, a paused debugger, or a flaky downstream API will gracefully throttle the pipeline rather than burn credits.
Use concurrency to cap parallelism per stream — your plan also has a global ceiling enumerated on Rate limits.
06Resuming a dropped stream
SSE includes a built-in resume mechanism. When a connection drops, reconnect with the Last-Event-ID header set to the highest id: you received. Hypedata will skip URLs you've already been told about and continue.
curl -N -X POST https://api.hypedata.io/v1/stream/$STREAM_ID \ -H "Authorization: Bearer $HYPEDATA_API_KEY" \ -H "Last-Event-ID: 4271"
Streams remain resumable for 15 minutes after disconnect. After that, completed-but-undelivered results are still retrievable from the dashboard or via the Jobs API using the stream's job_id.
07SDK helpers
All official SDKs expose an async iterator over events. Node and Python additionally include collect() helpers that buffer the entire stream into an array — useful for small batches where you'd rather treat the call as synchronous.
// Node — collect into an array const results = await hd.stream({ urls, render: true }).collect();
08Limits
- URLs per stream: 10,000.
- Concurrent connections per workspace: Free 2 · Pro 8 · Scale 32 · Enterprise unlimited.
- Concurrency per stream: 100.
- Max wall time: 6 hours.
- Max payload size per event: 25 MB (same as Scrape API).