API Reference

Complete reference for the Helix Parse API.

Base URL

https://api.feeds.onhelix.ai

Authentication

All requests require API key authentication using the Bearer token scheme:

Authorization: Bearer YOUR_API_KEY

See the Authentication Guide for details on obtaining and using API keys.

Response Envelope

All successful responses use the standard wrapper:

{
  "success": true,
  "data": { ... }
}

Parse Content

Extract structured content from a URL or raw HTML.

POST /parse

Request

Headers:

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Body:

{
  "url": "https://www.bbc.com/news/technology-67988517"
}

Parameters:

Parameter	Type	Required	Description
`url`	string (URL)	Conditional	URL to scrape and parse. Required if `html` not provided.
`html`	string	Conditional	Raw HTML to parse. Required if `url` not provided. Max 2MB.
`title`	string	No	Page title hint for extraction. Max 1,000 chars.
`jobId`	string	No	Custom job ID for idempotency. Max 256 chars.

Validation: Either url or html must be provided. Both can be provided simultaneously -- when both are present, the HTML is used for extraction (no scrape), and the URL is stored as metadata. The url field is validated as a proper URL format; invalid URLs are rejected.

Response

Status: 200 OK

{
  "success": true,
  "data": {
    "jobId": "550e8400-e29b-41d4-a716-446655440000",
    "hasPrimaryContent": true,
    "consumability": {
      "isConsumable": true,
      "reason": "Page contains a full news article with headline, body text, and publication metadata."
    },
    "primaryContent": {
      "title": "Apple Vision Pro: First weekend sees steady sales at stores",
      "description": "Apple's new mixed-reality headset goes on sale in the US, with steady demand reported at stores across the country.",
      "author": "Zoe Kleinman",
      "publisher": "BBC News",
      "publishedAt": "2024-02-04T12:30:00.000Z",
      "updatedAt": "2024-02-04T15:45:00.000Z",
      "isSponsored": false,
      "isDigest": false,
      "accessRestrictionType": null,
      "text": {
        "simplifiedHtml": "<p>Apple's Vision Pro headset has seen steady sales during its first weekend on sale in the US, with reports of consistent demand at Apple stores across the country.</p><p>The $3,499 device, which Apple calls a \"spatial computer\", went on sale on Friday.</p><p>Some stores saw queues, though they were shorter than those seen for recent iPhone launches.</p>"
      },
      "video": null,
      "primaryImage": {
        "url": "https://ichef.bbci.co.uk/news/1024/branded_news/1234/production/_132567890_visionpro.jpg",
        "caption": "A customer tries on the Apple Vision Pro at an Apple Store",
        "credit": "Getty Images"
      },
      "originallyPublished": {
        "syndicated": false,
        "domain": null,
        "url": null,
        "publisher": null,
        "publishedAt": null
      }
    },
    "scrape": {
      "httpStatus": 200
    }
  }
}

Response Fields:

Field	Type	Description
`jobId`	string	Job identifier (your provided jobId or auto-generated UUID)
`hasPrimaryContent`	boolean	Whether meaningful primary content was extracted
`consumability`	object	Content quality assessment (see below)
`primaryContent`	object\|null	Extracted content (see below). Null if nothing extracted.
`scrape`	object	HTTP scrape metadata. Present only in URL mode; absent in HTML mode.

Consumability Object:

Field	Type	Description
`isConsumable`	boolean	Whether the page has meaningful standalone content
`reason`	string	Natural language explanation of the assessment

Primary Content Object:

Field	Type	Description
`title`	string\|null	Page or article title
`description`	string\|null	Summary or meta description
`author`	string\|null	Content author
`publisher`	string\|null	Publishing organization
`publishedAt`	string\|null	Publication date (ISO 8601)
`updatedAt`	string\|null	Last update date (ISO 8601)
`isSponsored`	boolean\|null	Whether the content is sponsored
`isDigest`	boolean\|null	Whether the page is a digest of other content
`accessRestrictionType`	string[]\|null	Detected access restrictions (see below)
`text`	object\|null	Body content (see below)
`video`	object\|null	Video content (see below)
`primaryImage`	object\|null	Primary image (see below)
`originallyPublished`	object\|null	Original source for syndicated content (see below)

Text Object:

Field	Type	Description
`simplifiedHtml`	string	Simplified HTML of the body content

Video Object:

Field	Type	Description
`url`	string	Video URL
`duration`	string	Video duration

Primary Image Object:

Field	Type	Description
`url`	string	Image URL
`caption`	string\|null	Image caption
`credit`	string\|null	Image credit or attribution

Originally Published Object:

Field	Type	Description
`syndicated`	boolean\|null	Whether the content is syndicated
`domain`	string\|null	Domain of the original publication
`url`	string\|null	URL of the original publication
`publisher`	string\|null	Name of the original publisher
`publishedAt`	string\|null	Original publication date (ISO 8601)

Access Restriction Types:

Value	Description
`subscription-required`	Content behind a paywall
`bot-detected`	Bot detection challenge served
`captcha`	CAPTCHA presented
`adblock-detected`	Ad blocker detection blocked content
`login-required`	Login required to view content
`geo`	Geographic restriction
`other`	Other restriction

Scrape Object:

Field	Type	Description
`httpStatus`	number	HTTP status code from the page scrape

Errors

400 Bad Request errors use a nested error envelope:

{
  "success": false,
  "error": {
    "code": "INVALID_BODY",
    "message": "Invalid request body: Either url or html must be provided"
  }
}

401 Unauthorized errors use a flat error format:

{
  "success": false,
  "error": "Authentication failed for strategy: api-key",
  "code": "AUTHENTICATION_FAILED"
}

Error Cases:

400 Bad Request -- INVALID_BODY (validation errors):

Missing both url and html
Invalid URL format
HTML exceeds 2MB size limit
Title exceeds 1,000 character limit

400 Bad Request -- VALIDATION_FAILED (workflow errors):

"Previous parse job failed. Please retry with a new jobId."
"Parse job failed: {cause}. Please retry with a new jobId."

401 Unauthorized -- AUTHENTICATION_FAILED:

Missing or invalid API key

Idempotency

The jobId parameter enables request deduplication, scoped per organization.

When you provide a jobId, any subsequent request with the same jobId within the same organization reconnects to the existing workflow result rather than starting a new parse.
Without a jobId, a random UUID is generated for each request.
If a previous job with the same jobId failed, the API returns a 400 error asking you to retry with a new jobId.

Examples

curl -- URL mode:

curl -X POST https://api.feeds.onhelix.ai/parse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.bbc.com/news/technology-67988517"
  }'

curl -- HTML mode:

curl -X POST https://api.feeds.onhelix.ai/parse \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "html": "<html><head><title>Example Article</title></head><body><article><h1>Breaking News</h1><p>Article content here...</p></article></body></html>",
    "title": "Example Article"
  }'

JavaScript:

const API_KEY = process.env.HELIX_API_KEY;
const BASE_URL = 'https://api.feeds.onhelix.ai';

async function parseUrl(url) {
  try {
    const response = await fetch(`${BASE_URL}/parse`, {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url }),
    });

    if (!response.ok) {
      const error = await response.json();
      throw new Error(
        `Parse failed (${response.status}): ${JSON.stringify(error)}`
      );
    }

    const { data } = await response.json();

    if (!data.hasPrimaryContent) {
      console.log('No primary content extracted');
      return null;
    }

    return data.primaryContent;
  } catch (error) {
    console.error('Parse request failed:', error.message);
    throw error;
  }
}

Python:

import os
import time

import requests

API_KEY = os.environ["HELIX_API_KEY"]
BASE_URL = "https://api.feeds.onhelix.ai"


def parse_url(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                f"{BASE_URL}/parse",
                json={"url": url},
                headers={"Authorization": f"Bearer {API_KEY}"},
            )
            response.raise_for_status()

            data = response.json()["data"]

            if not data["hasPrimaryContent"]:
                return None

            return data["primaryContent"]

        except requests.exceptions.HTTPError as e:
            error_body = e.response.json()
            print(f"HTTP {e.response.status_code}: {error_body}")
            raise

        except requests.exceptions.RequestException:
            if attempt == max_retries - 1:
                raise
            time.sleep(2**attempt)

    return None

Next Steps

Quickstart: Get parsing working in under 2 minutes
Overview: How Parse works and what it returns
Authentication: API key best practices

Base URL​

Authentication​

Response Envelope​

Parse Content​

Request​

Response​

Errors​

Idempotency​

Examples​

Next Steps​