News Feeds
Collect and monitor news content from websites, sitemaps, and index pages with structured data extraction.
Overview
News feeds aggregate articles from multiple sources into a single, organized stream. Each feed continuously monitors its sources for new content, automatically extracting structured data including titles, descriptions, and metadata.
Key benefits:
- Automatic discovery: New articles are detected and processed without manual intervention
- Clean data extraction: Structured content with titles, descriptions, and metadata
- Flexible retrieval: Access via API polling or webhooks
- Historical backfill: Automatically includes recent articles when adding sources
Quick Example
# Create a feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Tech News", "description": "Technology coverage"}'
# Add a source
curl -X POST https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"sourceType": "site", "domain": "techcrunch.com"}'
# Retrieve articles
curl "https://api.feeds.onhelix.ai/feeds/news/{feedId}/items?limit=20" \
-H "Authorization: Bearer YOUR_API_KEY"
See the Quickstart Guide for a complete walkthrough.
Source Types
News feeds support three types of sources, each optimized for different monitoring needs.
Website Sources
Monitor entire domains for new articles.
Use when:
- You want to track all content from a publisher
- The site has clear article pages (news sites, blogs)
- You don't need to filter by specific sections
How it works:
- Helix crawls the domain to discover article pages
- New pages are detected automatically as they're published
- Articles are extracted with clean, structured data
Example:
{
"sourceType": "site",
"domain": "techcrunch.com"
}
Monitors all articles published on techcrunch.com.
Sitemap Sources
Track specific XML sitemaps for more targeted content discovery.
Use when:
- The site publishes a news sitemap
- You want faster, more reliable article detection
- You need to monitor a specific content subset
How it works:
- Helix monitors the sitemap for new URLs
- New articles are detected as they appear in the sitemap
- Content is extracted from the discovered URLs
Requirements:
- The sitemap must be publicly accessible
- URLs must point to article pages
Example:
{
"sourceType": "sitemap",
"url": "https://techcrunch.com/news-sitemap.xml",
"siteId": "site_abc123"
}
Getting a siteId: When you create a site source, Helix returns a sourceId that represents the site. Use this sourceId as the siteId when creating sitemap or index page sources for the same domain.
Index Page Sources
Monitor specific pages that list articles, such as category or section pages.
Use when:
- You want to track a specific category or section
- The site organizes articles on dedicated index pages
- You need focused monitoring of particular topics
How it works:
- Helix monitors the index page for article links
- New links are extracted and their content processed
- Articles appear in your feed as they're added to the index
Requirements:
- The index page must contain clear article links
- The page should update regularly with new content
Example:
{
"sourceType": "siteIndexPage",
"url": "https://techcrunch.com/category/artificial-intelligence/",
"siteId": "site_abc123"
}
Getting a siteId: Same as with sitemap sources—create a site source first, then use its sourceId.
How Processing Works
Adding a Source
When you add a source to a feed:
- Immediate validation: Helix verifies the source is accessible
- Automatic backfill: A 14-day backfill begins to populate your feed with recent articles
- Continuous monitoring: The source is monitored for new content
Feed Items
Each item in your feed includes structured data:
| Field | Description |
|---|---|
id | Unique identifier for the feed item |
newsPageId | Reference to the underlying news page |
type | Source type (site or instagram) |
data.url | Direct link to the article |
data.title | Article headline |
data.description | Article summary or excerpt |
createdAt | When the item was added to your feed |
updatedAt | Last time the item was modified |
Example feed item:
{
"id": "item_abc123",
"newsPageId": "page_def456",
"type": "site",
"data": {
"url": "https://example.com/article",
"title": "Breaking: Major Announcement",
"description": "Details about the announcement...",
"sitePageId": "550e8400-e29b-41d4-a716-446655440001",
"sitePageVersionId": "550e8400-e29b-41d4-a716-446655440002"
},
"createdAt": "2024-01-15T12:00:00.000Z",
"updatedAt": "2024-01-15T12:00:00.000Z"
}
Feed Management
Enabling and Disabling Sources
Temporarily pause monitoring without deleting the source:
# Disable a source
curl -X PATCH https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources/{sourceId} \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"enabled": false}'
# Re-enable later
curl -X PATCH https://api.feeds.onhelix.ai/feeds/news/{feedId}/sources/{sourceId} \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"enabled": true}'
When disabled, the source stops monitoring for new articles. Re-enabling resumes monitoring but does not backfill articles published while disabled.
Organizing Multiple Feeds
Create separate feeds for different purposes:
# News monitoring feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
-d '{"name": "Daily News", "description": "General news monitoring"}'
# Competitor analysis feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
-d '{"name": "Competitors", "description": "Competitor announcements"}'
# Research feed
curl -X POST https://api.feeds.onhelix.ai/feeds/news \
-d '{"name": "Research Topics", "description": "Academic and research content"}'
Webhooks
Receive notifications when new articles are added to your feeds.
Available events:
news.item_added- Triggered when a new article is published and added to your feed
When webhooks send:
Webhooks fire immediately when:
- A new article is discovered from any of your sources
- The article passes through content extraction and processing
- The article is successfully added to your feed
What you receive:
Each webhook includes complete article metadata:
- Article title, description, and URL
- Source information (domain, name)
- Publication timestamp
- Categories and relevance score
- Sentiment analysis (optional)
Example webhook payload:
{
"event": "news.item_added",
"timestamp": "2025-11-08T12:34:56.789Z",
"data": {
"item_id": "news_1234567890",
"title": "Article Title",
"summary": "Brief summary of the article",
"url": "https://example.com/article",
"source": {
"name": "Example News",
"domain": "example.com"
},
"published_at": "2025-11-08T12:00:00.000Z"
}
}
See the News Webhooks documentation for complete payload details, security verification, and setup instructions.