Skip to main content

API Reference

Complete reference for the Helix Parse API.

Base URL

https://api.feeds.onhelix.ai

Authentication

All requests require API key authentication using the Bearer token scheme:

Authorization: Bearer YOUR_API_KEY

See the Authentication Guide for details on obtaining and using API keys.

Response Envelope

All successful responses use the standard wrapper:

{
"success": true,
"data": { ... }
}

Parse Content

Extract structured content from a URL or raw HTML.

POST /parse

Request

Headers:

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Body:

{
"url": "https://www.bbc.com/news/technology-67988517"
}

Parameters:

ParameterTypeRequiredDescription
urlstring (URL)ConditionalURL to scrape and parse. Required if html not provided.
htmlstringConditionalRaw HTML to parse. Required if url not provided. Max 2MB.
titlestringNoPage title hint for extraction. Max 1,000 chars.
jobIdstringNoCustom job ID for idempotency. Max 256 chars.

Validation: Either url or html must be provided. Both can be provided simultaneously -- when both are present, the HTML is used for extraction (no scrape), and the URL is stored as metadata. The url field is validated as a proper URL format; invalid URLs are rejected.

Response

Status: 200 OK

{
"success": true,
"data": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"hasPrimaryContent": true,
"consumability": {
"isConsumable": true,
"reason": "Page contains a full news article with headline, body text, and publication metadata."
},
"primaryContent": {
"title": "Apple Vision Pro: First weekend sees steady sales at stores",
"description": "Apple's new mixed-reality headset goes on sale in the US, with steady demand reported at stores across the country.",
"author": "Zoe Kleinman",
"publisher": "BBC News",
"publishedAt": "2024-02-04T12:30:00.000Z",
"updatedAt": "2024-02-04T15:45:00.000Z",
"isSponsored": false,
"isDigest": false,
"accessRestrictionType": null,
"text": {
"simplifiedHtml": "<p>Apple's Vision Pro headset has seen steady sales during its first weekend on sale in the US, with reports of consistent demand at Apple stores across the country.</p><p>The $3,499 device, which Apple calls a \"spatial computer\", went on sale on Friday.</p><p>Some stores saw queues, though they were shorter than those seen for recent iPhone launches.</p>"
},
"video": null,
"primaryImage": {
"url": "https://ichef.bbci.co.uk/news/1024/branded_news/1234/production/_132567890_visionpro.jpg",
"caption": "A customer tries on the Apple Vision Pro at an Apple Store",
"credit": "Getty Images"
},
"originallyPublished": {
"syndicated": false,
"domain": null,
"url": null,
"publisher": null,
"publishedAt": null
}
},
"scrape": {
"httpStatus": 200
}
}
}

Response Fields:

FieldTypeDescription
jobIdstringJob identifier (your provided jobId or auto-generated UUID)
hasPrimaryContentbooleanWhether meaningful primary content was extracted
consumabilityobjectContent quality assessment (see below)
primaryContentobject|nullExtracted content (see below). Null if nothing extracted.
scrapeobjectHTTP scrape metadata. Present only in URL mode; absent in HTML mode.

Consumability Object:

FieldTypeDescription
isConsumablebooleanWhether the page has meaningful standalone content
reasonstringNatural language explanation of the assessment

Primary Content Object:

FieldTypeDescription
titlestring|nullPage or article title
descriptionstring|nullSummary or meta description
authorstring|nullContent author
publisherstring|nullPublishing organization
publishedAtstring|nullPublication date (ISO 8601)
updatedAtstring|nullLast update date (ISO 8601)
isSponsoredboolean|nullWhether the content is sponsored
isDigestboolean|nullWhether the page is a digest of other content
accessRestrictionTypestring[]|nullDetected access restrictions (see below)
textobject|nullBody content (see below)
videoobject|nullVideo content (see below)
primaryImageobject|nullPrimary image (see below)
originallyPublishedobject|nullOriginal source for syndicated content (see below)

Text Object:

FieldTypeDescription
simplifiedHtmlstringSimplified HTML of the body content

Video Object:

FieldTypeDescription
urlstringVideo URL
durationstringVideo duration

Primary Image Object:

FieldTypeDescription
urlstringImage URL
captionstring|nullImage caption
creditstring|nullImage credit or attribution

Originally Published Object:

FieldTypeDescription
syndicatedboolean|nullWhether the content is syndicated
domainstring|nullDomain of the original publication
urlstring|nullURL of the original publication
publisherstring|nullName of the original publisher
publishedAtstring|nullOriginal publication date (ISO 8601)

Access Restriction Types:

ValueDescription
subscription-requiredContent behind a paywall
bot-detectedBot detection challenge served
captchaCAPTCHA presented
adblock-detectedAd blocker detection blocked content
login-requiredLogin required to view content
geoGeographic restriction
otherOther restriction

Scrape Object:

FieldTypeDescription
httpStatusnumberHTTP status code from the page scrape

Errors

400 Bad Request errors use a nested error envelope:

{
"success": false,
"error": {
"code": "INVALID_BODY",
"message": "Invalid request body: Either url or html must be provided"
}
}

401 Unauthorized errors use a flat error format:

{
"success": false,
"error": "Authentication failed for strategy: api-key",
"code": "AUTHENTICATION_FAILED"
}

Error Cases:

400 Bad Request -- INVALID_BODY (validation errors):

  • Missing both url and html
  • Invalid URL format
  • HTML exceeds 2MB size limit
  • Title exceeds 1,000 character limit

400 Bad Request -- VALIDATION_FAILED (workflow errors):

  • "Previous parse job failed. Please retry with a new jobId."
  • "Parse job failed: {cause}. Please retry with a new jobId."

401 Unauthorized -- AUTHENTICATION_FAILED:

  • Missing or invalid API key

Idempotency

The jobId parameter enables request deduplication, scoped per organization.

  • When you provide a jobId, any subsequent request with the same jobId within the same organization reconnects to the existing workflow result rather than starting a new parse.
  • Without a jobId, a random UUID is generated for each request.
  • If a previous job with the same jobId failed, the API returns a 400 error asking you to retry with a new jobId.

Examples

curl -- URL mode:

curl -X POST https://api.feeds.onhelix.ai/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.bbc.com/news/technology-67988517"
}'

curl -- HTML mode:

curl -X POST https://api.feeds.onhelix.ai/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"html": "<html><head><title>Example Article</title></head><body><article><h1>Breaking News</h1><p>Article content here...</p></article></body></html>",
"title": "Example Article"
}'

JavaScript:

const API_KEY = process.env.HELIX_API_KEY;
const BASE_URL = 'https://api.feeds.onhelix.ai';

async function parseUrl(url) {
try {
const response = await fetch(`${BASE_URL}/parse`, {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ url }),
});

if (!response.ok) {
const error = await response.json();
throw new Error(
`Parse failed (${response.status}): ${JSON.stringify(error)}`
);
}

const { data } = await response.json();

if (!data.hasPrimaryContent) {
console.log('No primary content extracted');
return null;
}

return data.primaryContent;
} catch (error) {
console.error('Parse request failed:', error.message);
throw error;
}
}

Python:

import os
import time

import requests

API_KEY = os.environ["HELIX_API_KEY"]
BASE_URL = "https://api.feeds.onhelix.ai"


def parse_url(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.post(
f"{BASE_URL}/parse",
json={"url": url},
headers={"Authorization": f"Bearer {API_KEY}"},
)
response.raise_for_status()

data = response.json()["data"]

if not data["hasPrimaryContent"]:
return None

return data["primaryContent"]

except requests.exceptions.HTTPError as e:
error_body = e.response.json()
print(f"HTTP {e.response.status_code}: {error_body}")
raise

except requests.exceptions.RequestException:
if attempt == max_retries - 1:
raise
time.sleep(2**attempt)

return None

Next Steps