Skip to content

Data Sources

Pipelines can pull data from multiple providers to enrich their processing context.

Data sources (also called providers) supply data to pipelines during processing. Pipelines can use one or more data sources to access:

  • HubSpot CRM data
  • LinkedIn profiles and activity
  • External data (DNS, news, financing)
  • Website content

Contact properties and associations.

Available Fields:

  • firstname, lastname, email
  • phone, mobilephone
  • jobtitle, company
  • address, city, state, zip, country
  • hs_linkedin_url
  • associatedcompanyid
  • All custom properties

Use Cases:

  • Contact enrichment
  • Data validation
  • Personalization
  • Company association

Example Mapping:

providers:
- identifier: hubspot_contact
alwaysIncludeFields: [firstname, lastname, email]

Company properties.

Available Fields:

  • name, domain
  • industry, type
  • numberofemployees, annualrevenue
  • phone, address, city, state, zip, country
  • linkedin_company_page
  • All custom properties

Use Cases:

  • Company enrichment
  • Firm graphics
  • Account data
  • Industry analysis

Example Mapping:

providers:
- identifier: hubspot_company
inputFieldMapping:
associatedcompanyid: hubspot_id
outputFieldNamePrefix: company__

Deal properties.

Available Fields:

  • dealname, amount
  • dealstage, pipeline
  • closedate
  • All custom properties

Use Cases:

  • Deal context in sequences
  • Account prioritization
  • Opportunity analysis

Historical communications.

Includes:

  • Email threads
  • Call logs
  • Meeting notes
  • Timeline activity

Use Cases:

  • Avoid duplicate outreach
  • Reference previous conversations
  • Context for sequences

Company-wide communication history.

Includes:

  • All communications with company
  • Multiple contacts
  • Team activity

Use Cases:

  • Account-based context
  • Team coordination
  • Avoid overlap

Individual LinkedIn profiles.

Available Fields:

  • Full name
  • Current job title
  • Current company
  • Employment history
  • Education
  • Location
  • Profile URL

Use Cases:

  • Contact enrichment
  • Employment verification
  • Personalization data

Cost: Included in contact enrichment pricing

Example:

providers:
- identifier: linkedin_person
inputFieldMapping:
hs_linkedin_url: linkedin_url

Company LinkedIn profiles.

Available Fields:

  • Company name
  • Industry
  • Company size (employee range)
  • Headquarters location
  • Company page URL
  • About/description

Use Cases:

  • Company enrichment
  • Firmographic data
  • Industry classification

Cost: Included in company enrichment pricing

Recent LinkedIn activity (personal).

Includes:

  • Recent posts
  • Engagement metrics
  • Topics discussed
  • Activity frequency

Use Cases:

  • Deep personalization
  • Timing outreach
  • Topic relevance

Cost: Additional (use selectively)

Company LinkedIn activity.

Includes:

  • Company posts
  • Announcements
  • Content themes
  • Engagement

Use Cases:

  • Company news awareness
  • Timing relevance
  • Content topics

Cost: Additional (use selectively)

DNS lookups for domain validation.

Record Types:

  • A (IPv4 address)
  • AAAA (IPv6 address)
  • MX (Mail exchange)
  • CNAME (Canonical name)
  • TXT (Text records)
  • NS (Name servers)

Use Cases:

  • Email deliverability validation
  • Domain verification
  • Technical validation

Cost: Free

Example:

providers:
- identifier: dns_records
inputFieldMapping:
company__domain: domain
outputFieldNamePrefix: dns__

Company funding and investment data.

Includes:

  • Funding rounds
  • Investment amounts
  • Investors
  • Dates

Use Cases:

  • Timely outreach
  • Qualification
  • Personalization

Cost: Included when enabled

Company news articles.

Includes:

  • Recent news
  • Press releases
  • Media mentions
  • Publication dates

Use Cases:

  • Timely relevance
  • Conversation starters
  • Context awareness

Cost: Included when enabled

AI-generated website summaries.

Includes:

  • Company description
  • Products/services
  • Value proposition
  • Key information

Use Cases:

  • Company research
  • Context for outreach
  • Quick company understanding

Cost: Included when enabled

Specific page content extraction.

Use Cases:

  • Verify company information
  • Check specific claims
  • Research context

Cost: Included when enabled

Priority-based value selection.

Purpose: Select values from multiple sources with priority rules.

Use Cases:

  • Sender selection in sequences
  • Fallback logic
  • Multi-source data

Maps pipeline fields to provider inputs:

inputFieldMapping:
associatedcompanyid: hubspot_id

Explanation:

  • Pipeline has associatedcompanyid field
  • Provider needs hubspot_id field
  • Mapping connects them

Namespaces provider outputs to avoid collisions:

outputFieldNamePrefix: company__

Results:

name → company__name
domain → company__domain
industry → company__industry

Forces provider to fetch specific fields:

alwaysIncludeFields: [firstname, lastname, email]

Ensures:

  • Fields always present
  • Even if not in default set
  • Explicit data requirements
providers:
- identifier: hubspot_contact
alwaysIncludeFields: [firstname, lastname, email, associatedcompanyid]
- identifier: hubspot_company
inputFieldMapping:
associatedcompanyid: hubspot_id
outputFieldNamePrefix: company__
alwaysIncludeFields: [name, domain, industry]
providers:
- identifier: hubspot_contact
- identifier: linkedin_person
inputFieldMapping:
hs_linkedin_url: linkedin_url
providers:
- identifier: hubspot_company
alwaysIncludeFields: [name, domain]
- identifier: dns_records
inputFieldMapping:
domain: domain
outputFieldNamePrefix: dns__
providers:
- identifier: hubspot_company
inputFieldMapping:
object_1_id: hubspot_id
outputFieldNamePrefix: company_1__
- identifier: hubspot_company
inputFieldMapping:
object_2_id: hubspot_id
outputFieldNamePrefix: company_2__
  • hubspot_contact
  • hubspot_company
  • hubspot_deal
  • dns_records
  • hubspot_previous_communication
  • linkedin_person (contact enrichment)
  • linkedin_organization (company enrichment)
  • website_summary
  • website_content_by_url
  • linkedin_person_posts
  • linkedin_organization_posts
  • predictleads_news (volume-based)
  • predictleads_financing_event (volume-based)

Initial setup:

  1. Core HubSpot data only
  2. Test pipeline
  3. Add enrichment sources if needed
  4. Measure impact

Free & valuable:

  • HubSpot contact/company
  • DNS records
  • Previous communication

Use selectively:

  • LinkedIn posts (expensive)
  • External news (variable value)

Before adding source:

  1. Run pipeline without it
  2. Note quality/success rate
  3. Add source
  4. Re-run pipeline
  5. Compare results
  6. Keep only if meaningful improvement

Problem: Expected fields not populated

Check:

  1. Field exists in HubSpot?
  2. Field mapping correct?
  3. Provider has access?
  4. AlwaysIncludeFields set?

Solutions:

  • Add to alwaysIncludeFields
  • Check field name spelling
  • Verify provider configuration
  • Review execution logs

Problem: Pipeline processing slowly

Likely causes:

  • Too many data sources
  • External API calls slow
  • Large data fetching

Solutions:

  • Disable unnecessary sources
  • Use only required fields
  • Limit enrichment sources
  • Process smaller batches

Problem: Higher costs than expected

Review:

  • Which providers enabled?
  • LinkedIn posts usage?
  • External API calls?
  • Processing volume?

Solutions:

  • Disable expensive sources
  • Use targeted lists
  • Process high-value records only
  • Monitor Usage tab