hypedata
  • Product
  • How it works
  • Use cases
  • Pricing
  • Developers
Sign in Start free
Hypedata/ Developers/ AI Parser

AI Parser.

Describe the data you want in plain English (or JSON Schema). The AI Parser reads the page and returns validated, typed JSON. No CSS selectors, no XPath — and no maintenance the next time the target redesigns.

EndpointPOST /v1/parse
Modelhypedata-extract-2
Cost5 credits
ValidationJSON Schema 2020-12
Contents
  1. 01Why an AI parser
  2. 02Endpoint
  3. 03Sketch schemas
  4. 04JSON Schema
  5. 05Nested & arrays
  6. 06Confidence & citations
  7. 07Plan caching
  8. 08Failure modes

01Why an AI parser

Hand-written selectors are brittle. They break when the target adds a CSS class, swaps a div for a section, or A/B-tests a new layout. They require a custom parser per site and constant maintenance.

The Hypedata AI Parser is a structured-output model trained on hundreds of millions of web pages and their canonical extractions. You describe what you want, it figures out where to find it. When the target redesigns, the parser keeps working — no code change on your end.

When NOT to use it

If you're scraping a single highly-structured endpoint (e.g. a public JSON-LD product page) where the schema is already explicit, parse it yourself — it's cheaper and instantaneous. The AI Parser shines on messy, inconsistent, or evolving pages.

02Endpoint

POST https://api.hypedata.io/v1/parse

You can use the parser two ways: inline as part of a /v1/scrape call (extract parameter), or standalone by POSTing HTML you already have.

POST /v1/scrape
{
  "url": "https://shop.example.com/p/alpha",
  "render": true,
  "extract": { "name": "string", "price": "number" }
}
POST /v1/parse
{
  "html": "<html>…</html>",
  "url": "https://shop.example.com/p/alpha",   // optional, helps the model
  "schema": { "name": "string", "price": "number" }
}

03Sketch schemas

Sketch schemas are a compact JSON dialect where each value is a one-line type-and-hint. The parser infers the rest. They're ideal for prototyping and 90% of production cases.

{
  "title":        "string · article headline",
  "author":       "string · byline name only — no titles",
  "published_at": "string · ISO 8601 date, in UTC",
  "reading_time": "integer · minutes",
  "tags":         "array of strings · all tag/category labels",
  "paywall":      "boolean · true if any part of the body is gated"
}

Recognized type prefixes:

  • string, integer, number, boolean
  • array of <type>
  • object — followed by nested keys (see Nested & arrays)
  • enum — followed by a list of allowed values: "enum · in_stock | out_of_stock | preorder"
  • nullable <type> — when missing fields should be null rather than failing

04JSON Schema

For strict pipelines, supply a full JSON Schema 2020-12 document. The parser validates the result against it and refuses to return non-conforming JSON.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["name", "price", "currency"],
  "properties": {
    "name":     { "type": "string", "minLength": 1 },
    "price":    { "type": "number", "minimum": 0 },
    "currency": { "type": "string", "pattern": "^[A-Z]{3}$" },
    "in_stock": { "type": "boolean" },
    "images":   { "type": "array", "items": { "type": "string", "format": "uri" } }
  }
}

We support type, required, properties, items, enum, format (uri/email/date-time/uuid), pattern, minimum/maximum, minLength/maxLength, anyOf, and oneOf. Unsupported keywords are silently ignored.

05Nested & arrays

Nested objects and arrays of objects work the same way in both formats. Example: scraping a list of reviews from a product page.

{
  "product_name": "string",
  "average_rating": "number · 0–5, one decimal",
  "reviews": {
    "_type": "array",
    "_items": {
      "author":    "string",
      "rating":    "integer · 1–5",
      "date":      "string · ISO 8601",
      "verified":  "boolean",
      "body":      "string"
    }
  }
}

For pagination-aware extractions (collect all reviews across paginated pages), combine the parser with the crawl loop pattern — the parser itself works on a single page at a time.

06Confidence & citations

Pass "return_confidence": true on the request and each leaf value in the response gains a sibling {field}_confidence in the 0..1 range, plus a {field}_citation pointing to the source span in the HTML.

{
  "data": {
    "name": "Alpha Coat — Tobacco",
    "name_confidence": 0.98,
    "name_citation": { "selector": "h1.pdp-title", "offset": 0, "length": 20 },
    "price": 219,
    "price_confidence": 0.94
  }
}

Low-confidence values (< 0.6) are usually a sign the page doesn't actually contain the field, or you need a more specific hint in the sketch. Confidence reporting adds 1 credit per request.

07Plan caching

For a given (hostname, schema) pair, the parser compiles an extraction plan on the first request. Subsequent requests within 24 hours reuse the plan — same accuracy, ~10× lower latency, half the cost (charged 3 credits instead of 5).

Plan caching is automatic and per-workspace. It's the main reason high-volume catalog scrapes get cheap fast: the first 100 product pages train the plan, the next 100,000 ride on it.

08Failure modes

CodeCauseRecovery
schema_invalidSketch can't be parsed or JSON Schema is malformed.Fix the schema and re-request.
validation_failedThe model produced output but it doesn't pass your strict JSON Schema.Loosen required, or read partial_data in the error body.
page_unparseableHTML was empty, binary, or so heavily obfuscated the parser refused.Verify render: true is set; check the screenshot_url to see what loaded.
model_timeoutThe page was too large to process within budget.Trim with extract_root selector, or split into multiple passes.
← Previous Stream API Next → Webhooks
hypedata. SHERIDAN, WY · EST. 2024
HYPELABS, LLC · v2.4.0
hypedata

Production-grade web data infrastructure. Operated by HypeLabs, LLC under the laws of Wyoming, USA.

All systems operational

Product

  • Scrape API
  • SERP API
  • Stream API
  • AI Parser
  • Pricing

Developers

  • Documentation
  • SDKs
  • API reference
  • Quickstart
  • Status page

Company

  • About
  • Customers
  • Blog
  • Careers
  • Press kit

Legal

  • Terms
  • Privacy
  • DPA
  • Acceptable use
  • Security
© 2026 HYPELABS, LLC · EIN 35-2851293 · SHERIDAN, WY
Twitter / XGitHubLinkedIn