Jobs & batches — Hypedata Docs

01Create a job

POST https://api.hypedata.io/v1/jobs

POST /v1/jobs
{
  "name": "nightly-catalog-2026-05-12",
  "input": { "upload_id": "upl_8K2nB7" },     // or "urls": [...] inline
  "defaults": {
    "render": true,
    "proxy_type": "residential",
    "extract": { "name": "string", "price": "number" }
  },
  "concurrency": 32,
  "output": { "format": "ndjson", "gzip": true },
  "webhook": "https://your-app.com/hooks/jobs"
}

202Accepted · queued

{
  "id": "job_3F2D1A77B0E1",
  "status": "queued",
  "urls_total": 128400,
  "eta_s": 2700,
  "created_at": "2026-05-12T22:00:14Z"
}

02Input formats

Three ways to deliver URLs:

Inline. "urls": ["https://…", …] up to 1,000 URLs. Great for ad-hoc.
Upload. POST /v1/uploads with an NDJSON or CSV file (1 GB max), then pass the returned upload_id.
S3 / GCS. "input": { "s3_uri": "s3://bucket/path.ndjson", "role_arn": "…" }. We assume your role and stream the file.

Per-URL overrides are supported — supply each line as a JSON object with at minimum "url", and any subset of Scrape parameters to override defaults.

03Poll status

GET/v1/jobs/{id}

{
  "id": "job_3F2D1A77B0E1",
  "status": "running",        // queued | running | completed | cancelled | failed
  "urls_total": 128400,
  "urls_done": 48217,
  "urls_errored": 312,
  "credits_used": 293482,
  "eta_s": 1820,
  "download_url": null,         // present once status=completed
  "download_url_expires_at": null
}

Prefer the job.completed webhook over polling — it's more accurate, lower-latency, and free.

04Output formats

ndjson (default) — one JSON line per URL. Streaming-friendly.
csv — flat CSV with the extracted fields as columns. Requires an extract schema in defaults.
parquet — Apache Parquet, compressed (zstd by default). Same column rules as CSV.

The download URL is a signed S3 link valid for 24 hours by default (configurable up to 30 days). Failed URLs are included in the output with "status": "error".

05List · cancel · retry

GET/v1/jobs?status=running&limit=20

DELETE/v1/jobs/{id}

POST/v1/jobs/{id}/retry

Cancellation is graceful — in-flight URLs finish, queued ones are skipped, the partial output becomes available. Retry produces a new job containing only the URLs that errored in the original.

06Limits

1,000,000 URLs per job. Need more? Chain jobs from a webhook.
Maximum job lifetime: 24 hours. Long jobs are auto-cancelled with whatever has been completed retained.
Concurrency per job: 256 (subject to plan concurrency cap).
Maximum result file size: 50 GB (gzipped). Larger jobs are split into multi-part files.