News Feeds

Collect and monitor news content from websites, sitemaps, and index pages with structured data extraction.

Overview

News feeds aggregate articles from multiple sources into a single, organized stream. Each feed continuously monitors its sources for new content, automatically extracting structured data including titles, descriptions, and metadata.

Key benefits:

Automatic discovery: New articles are detected and processed without manual intervention
Clean data extraction: Structured content with titles, descriptions, and metadata
Flexible retrieval: Access via API polling or webhooks
Historical backfill: Automatically includes recent articles when adding sources

Quick Example

# Create a feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Tech News", "description": "Technology coverage"}'

# Add a source
curl -X POST https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"sourceType": "site", "domain": "techcrunch.com"}'

# Retrieve articles
curl "https://api.feeds.onhelix.ai/feeds/news/{feedId}/items?limit=20" \
  -H "Authorization: Bearer YOUR_API_KEY"

See the Quickstart Guide for a complete walkthrough.

Source Types

News feeds support three types of sources, each optimized for different monitoring needs.

Website Sources

Monitor entire domains for new articles.

Use when:

You want to track all content from a publisher
The site has clear article pages (news sites, blogs)
You don't need to filter by specific sections

How it works:

Helix crawls the domain to discover article pages
New pages are detected automatically as they're published
Articles are extracted with clean, structured data

Example:

{
  "sourceType": "site",
  "domain": "techcrunch.com"
}

Monitors all articles published on techcrunch.com.

Sitemap Sources

Track specific XML sitemaps for more targeted content discovery.

Use when:

The site publishes a news sitemap
You want faster, more reliable article detection
You need to monitor a specific content subset

How it works:

Helix monitors the sitemap for new URLs
New articles are detected as they appear in the sitemap
Content is extracted from the discovered URLs

Requirements:

The sitemap must be publicly accessible
URLs must point to article pages

Example:

{
  "sourceType": "sitemap",
  "url": "https://techcrunch.com/news-sitemap.xml",
  "siteId": "site_abc123"
}

Getting a siteId: When you create a site source, Helix returns a sourceId that represents the site. Use this sourceId as the siteId when creating sitemap or index page sources for the same domain.

Index Page Sources

Monitor specific pages that list articles, such as category or section pages.

Use when:

You want to track a specific category or section
The site organizes articles on dedicated index pages
You need focused monitoring of particular topics

How it works:

Helix monitors the index page for article links
New links are extracted and their content processed
Articles appear in your feed as they're added to the index

Requirements:

The index page must contain clear article links
The page should update regularly with new content

Example:

{
  "sourceType": "siteIndexPage",
  "url": "https://techcrunch.com/category/artificial-intelligence/",
  "siteId": "site_abc123"
}

Getting a siteId: Same as with sitemap sources—create a site source first, then use its sourceId.

How Processing Works

Adding a Source

When you add a source to a feed:

Immediate validation: Helix verifies the source is accessible
Automatic backfill: A 14-day backfill begins to populate your feed with recent articles
Continuous monitoring: The source is monitored for new content

Feed Items

Each item in your feed includes structured data:

Field	Description
`id`	Unique identifier for the feed item
`newsPageId`	Reference to the underlying news page
`type`	Source type (`site` or `instagram`)
`data.url`	Direct link to the article
`data.title`	Article headline
`data.description`	Article summary or excerpt
`createdAt`	When the item was added to your feed
`updatedAt`	Last time the item was modified

Example feed item:

{
  "id": "item_abc123",
  "newsPageId": "page_def456",
  "type": "site",
  "data": {
    "url": "https://example.com/article",
    "title": "Breaking: Major Announcement",
    "description": "Details about the announcement...",
    "sitePageId": "550e8400-e29b-41d4-a716-446655440001",
    "sitePageVersionId": "550e8400-e29b-41d4-a716-446655440002"
  },
  "createdAt": "2024-01-15T12:00:00.000Z",
  "updatedAt": "2024-01-15T12:00:00.000Z"
}

Feed Management

Enabling and Disabling Sources

Temporarily pause monitoring without deleting the source:

# Disable a source
curl -X PATCH https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources/{sourceId} \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"enabled": false}'

# Re-enable later
curl -X PATCH https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources/{sourceId} \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"enabled": true}'

When disabled, the source stops monitoring for new articles. Re-enabling resumes monitoring but does not backfill articles published while disabled.

Organizing Multiple Feeds

Create separate feeds for different purposes:

# News monitoring feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
  -d '{"name": "Daily News", "description": "General news monitoring"}'

# Competitor analysis feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
  -d '{"name": "Competitors", "description": "Competitor announcements"}'

# Research feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
  -d '{"name": "Research Topics", "description": "Academic and research content"}'

Webhooks

Receive notifications when new articles are added to your feeds.

Available events:

news.item_added - Triggered when a new article is published and added to your feed

When webhooks send:

Webhooks fire immediately when:

A new article is discovered from any of your sources
The article passes through content extraction and processing
The article is successfully added to your feed

What you receive:

Each webhook includes complete article metadata:

Article title, description, and URL
Source information (domain, name)
Publication timestamp
Categories and relevance score
Sentiment analysis (optional)

Example webhook payload:

{
  "event": "news.item_added",
  "timestamp": "2025-11-08T12:34:56.789Z",
  "data": {
    "item_id": "news_1234567890",
    "title": "Article Title",
    "summary": "Brief summary of the article",
    "url": "https://example.com/article",
    "source": {
      "name": "Example News",
      "domain": "example.com"
    },
    "published_at": "2025-11-08T12:00:00.000Z"
  }
}

See the News Webhooks documentation for complete payload details, security verification, and setup instructions.

News Feeds

Overview​

Quick Example​

Source Types​

Website Sources​

Sitemap Sources​

Index Page Sources​

How Processing Works​

Adding a Source​

Feed Items​

Feed Management​

Enabling and Disabling Sources​

Organizing Multiple Feeds​

Webhooks​

Overview

Quick Example

Source Types

Website Sources

Sitemap Sources

Index Page Sources

How Processing Works

Adding a Source

Feed Items

Feed Management

Enabling and Disabling Sources

Organizing Multiple Feeds

Webhooks