Data Sources

Pipelines can pull data from multiple providers to enrich their processing context.

Overview

Data sources (also called providers) supply data to pipelines during processing. Pipelines can use one or more data sources to access:

HubSpot CRM data
LinkedIn profiles and activity
External data (DNS, news, financing)
Website content

HubSpot Providers

hubspot_contact

Contact properties and associations.

Available Fields:

firstname, lastname, email
phone, mobilephone
jobtitle, company
address, city, state, zip, country
hs_linkedin_url
associatedcompanyid
All custom properties

Use Cases:

Contact enrichment
Data validation
Personalization
Company association

Example Mapping:

providers:
  - identifier: hubspot_contact
    alwaysIncludeFields: [firstname, lastname, email]

hubspot_company

Company properties.

Available Fields:

name, domain
industry, type
numberofemployees, annualrevenue
phone, address, city, state, zip, country
linkedin_company_page
All custom properties

Use Cases:

Company enrichment
Firm graphics
Account data
Industry analysis

Example Mapping:

providers:
  - identifier: hubspot_company
    inputFieldMapping:
      associatedcompanyid: hubspot_id
    outputFieldNamePrefix: company__

hubspot_deal

Deal properties.

Available Fields:

dealname, amount
dealstage, pipeline
closedate
All custom properties

Use Cases:

Deal context in sequences
Account prioritization
Opportunity analysis

hubspot_previous_communication

Historical communications.

Includes:

Email threads
Call logs
Meeting notes
Timeline activity

Use Cases:

Avoid duplicate outreach
Reference previous conversations
Context for sequences

hubspot_other_company_communication

Company-wide communication history.

Includes:

All communications with company
Multiple contacts
Team activity

Use Cases:

Account-based context
Team coordination
Avoid overlap

LinkedIn Providers

linkedin_person

Individual LinkedIn profiles.

Available Fields:

Full name
Current job title
Current company
Employment history
Education
Location
Profile URL

Use Cases:

Contact enrichment
Employment verification
Personalization data

Cost: Included in contact enrichment pricing

Example:

providers:
  - identifier: linkedin_person
    inputFieldMapping:
      hs_linkedin_url: linkedin_url

linkedin_organization

Company LinkedIn profiles.

Available Fields:

Company name
Industry
Company size (employee range)
Headquarters location
Company page URL
About/description

Use Cases:

Company enrichment
Firmographic data
Industry classification

Cost: Included in company enrichment pricing

linkedin_person_posts

Recent LinkedIn activity (personal).

Includes:

Recent posts
Engagement metrics
Topics discussed
Activity frequency

Use Cases:

Deep personalization
Timing outreach
Topic relevance

Cost: Additional (use selectively)

linkedin_organization_posts

Company LinkedIn activity.

Includes:

Company posts
Announcements
Content themes
Engagement

Use Cases:

Company news awareness
Timing relevance
Content topics

Cost: Additional (use selectively)

External Data Providers

dns_records

DNS lookups for domain validation.

Record Types:

A (IPv4 address)
AAAA (IPv6 address)
MX (Mail exchange)
CNAME (Canonical name)
TXT (Text records)
NS (Name servers)

Use Cases:

Email deliverability validation
Domain verification
Technical validation

Cost: Free

Example:

providers:
  - identifier: dns_records
    inputFieldMapping:
      company__domain: domain
    outputFieldNamePrefix: dns__

predictleads_financing_event

Company funding and investment data.

Includes:

Funding rounds
Investment amounts
Investors
Dates

Use Cases:

Timely outreach
Qualification
Personalization

Cost: Included when enabled

predictleads_news

Company news articles.

Includes:

Recent news
Press releases
Media mentions
Publication dates

Use Cases:

Timely relevance
Conversation starters
Context awareness

Cost: Included when enabled

website_summary

AI-generated website summaries.

Includes:

Company description
Products/services
Value proposition
Key information

Use Cases:

Company research
Context for outreach
Quick company understanding

Cost: Included when enabled

website_content_by_url

Specific page content extraction.

Use Cases:

Verify company information
Check specific claims
Research context

Cost: Included when enabled

ordered_value_selector

Priority-based value selection.

Purpose: Select values from multiple sources with priority rules.

Use Cases:

Sender selection in sequences
Fallback logic
Multi-source data

Field Mapping

Input Field Mapping

Maps pipeline fields to provider inputs:

inputFieldMapping:
  associatedcompanyid: hubspot_id

Explanation:

Pipeline has associatedcompanyid field
Provider needs hubspot_id field
Mapping connects them

Output Field Prefix

Namespaces provider outputs to avoid collisions:

outputFieldNamePrefix: company__

Results:

name → company__name
domain → company__domain
industry → company__industry

Always Include Fields

Forces provider to fetch specific fields:

alwaysIncludeFields: [firstname, lastname, email]

Ensures:

Fields always present
Even if not in default set
Explicit data requirements

Common Patterns

Contact with Company

providers:
  - identifier: hubspot_contact
    alwaysIncludeFields: [firstname, lastname, email, associatedcompanyid]

  - identifier: hubspot_company
    inputFieldMapping:
      associatedcompanyid: hubspot_id
    outputFieldNamePrefix: company__
    alwaysIncludeFields: [name, domain, industry]

Contact with LinkedIn

providers:
  - identifier: hubspot_contact

  - identifier: linkedin_person
    inputFieldMapping:
      hs_linkedin_url: linkedin_url

Company with DNS Validation

providers:
  - identifier: hubspot_company
    alwaysIncludeFields: [name, domain]

  - identifier: dns_records
    inputFieldMapping:
      domain: domain
    outputFieldNamePrefix: dns__

Duplicate Pair Analysis

providers:
  - identifier: hubspot_company
    inputFieldMapping:
      object_1_id: hubspot_id
    outputFieldNamePrefix: company_1__

  - identifier: hubspot_company
    inputFieldMapping:
      object_2_id: hubspot_id
    outputFieldNamePrefix: company_2__

Cost Considerations

Free Providers

hubspot_contact
hubspot_company
hubspot_deal
dns_records
hubspot_previous_communication

Included with Enrichment

linkedin_person (contact enrichment)
linkedin_organization (company enrichment)
website_summary
website_content_by_url

Additional Cost

linkedin_person_posts
linkedin_organization_posts
predictleads_news (volume-based)
predictleads_financing_event (volume-based)

Best Practices

Be Selective

Start Minimal

Initial setup:

Core HubSpot data only
Test pipeline
Add enrichment sources if needed
Measure impact

Prioritize Free Sources

Free & valuable:

HubSpot contact/company
DNS records
Previous communication

Use selectively:

LinkedIn posts (expensive)
External news (variable value)

Test Impact

Before adding source:

Run pipeline without it
Note quality/success rate
Add source
Re-run pipeline
Compare results
Keep only if meaningful improvement

Troubleshooting

Missing Data

Problem: Expected fields not populated

Check:

Field exists in HubSpot?
Field mapping correct?
Provider has access?
AlwaysIncludeFields set?

Solutions:

Add to alwaysIncludeFields
Check field name spelling
Verify provider configuration
Review execution logs

Performance Issues

Problem: Pipeline processing slowly

Likely causes:

Too many data sources
External API calls slow
Large data fetching

Solutions:

Disable unnecessary sources
Use only required fields
Limit enrichment sources
Process smaller batches

Unexpected Costs

Problem: Higher costs than expected

Review:

Which providers enabled?
LinkedIn posts usage?
External API calls?
Processing volume?

Solutions:

Disable expensive sources
Use targeted lists
Process high-value records only
Monitor Usage tab

Next Steps

Pipeline Kinds — Choose the right pipeline kind
HubSpot Integration — Configure CRM
Best Practices — Optimization tips