Data Sources
Pipelines can pull data from multiple providers to enrich their processing context.
Overview
Section titled “Overview”Data sources (also called providers) supply data to pipelines during processing. Pipelines can use one or more data sources to access:
- HubSpot CRM data
- LinkedIn profiles and activity
- External data (DNS, news, financing)
- Website content
HubSpot Providers
Section titled “HubSpot Providers”hubspot_contact
Section titled “hubspot_contact”Contact properties and associations.
Available Fields:
- firstname, lastname, email
- phone, mobilephone
- jobtitle, company
- address, city, state, zip, country
- hs_linkedin_url
- associatedcompanyid
- All custom properties
Use Cases:
- Contact enrichment
- Data validation
- Personalization
- Company association
Example Mapping:
providers: - identifier: hubspot_contact alwaysIncludeFields: [firstname, lastname, email]hubspot_company
Section titled “hubspot_company”Company properties.
Available Fields:
- name, domain
- industry, type
- numberofemployees, annualrevenue
- phone, address, city, state, zip, country
- linkedin_company_page
- All custom properties
Use Cases:
- Company enrichment
- Firm graphics
- Account data
- Industry analysis
Example Mapping:
providers: - identifier: hubspot_company inputFieldMapping: associatedcompanyid: hubspot_id outputFieldNamePrefix: company__hubspot_deal
Section titled “hubspot_deal”Deal properties.
Available Fields:
- dealname, amount
- dealstage, pipeline
- closedate
- All custom properties
Use Cases:
- Deal context in sequences
- Account prioritization
- Opportunity analysis
hubspot_previous_communication
Section titled “hubspot_previous_communication”Historical communications.
Includes:
- Email threads
- Call logs
- Meeting notes
- Timeline activity
Use Cases:
- Avoid duplicate outreach
- Reference previous conversations
- Context for sequences
hubspot_other_company_communication
Section titled “hubspot_other_company_communication”Company-wide communication history.
Includes:
- All communications with company
- Multiple contacts
- Team activity
Use Cases:
- Account-based context
- Team coordination
- Avoid overlap
LinkedIn Providers
Section titled “LinkedIn Providers”linkedin_person
Section titled “linkedin_person”Individual LinkedIn profiles.
Available Fields:
- Full name
- Current job title
- Current company
- Employment history
- Education
- Location
- Profile URL
Use Cases:
- Contact enrichment
- Employment verification
- Personalization data
Cost: Included in contact enrichment pricing
Example:
providers: - identifier: linkedin_person inputFieldMapping: hs_linkedin_url: linkedin_urllinkedin_organization
Section titled “linkedin_organization”Company LinkedIn profiles.
Available Fields:
- Company name
- Industry
- Company size (employee range)
- Headquarters location
- Company page URL
- About/description
Use Cases:
- Company enrichment
- Firmographic data
- Industry classification
Cost: Included in company enrichment pricing
linkedin_person_posts
Section titled “linkedin_person_posts”Recent LinkedIn activity (personal).
Includes:
- Recent posts
- Engagement metrics
- Topics discussed
- Activity frequency
Use Cases:
- Deep personalization
- Timing outreach
- Topic relevance
Cost: Additional (use selectively)
linkedin_organization_posts
Section titled “linkedin_organization_posts”Company LinkedIn activity.
Includes:
- Company posts
- Announcements
- Content themes
- Engagement
Use Cases:
- Company news awareness
- Timing relevance
- Content topics
Cost: Additional (use selectively)
External Data Providers
Section titled “External Data Providers”dns_records
Section titled “dns_records”DNS lookups for domain validation.
Record Types:
- A (IPv4 address)
- AAAA (IPv6 address)
- MX (Mail exchange)
- CNAME (Canonical name)
- TXT (Text records)
- NS (Name servers)
Use Cases:
- Email deliverability validation
- Domain verification
- Technical validation
Cost: Free
Example:
providers: - identifier: dns_records inputFieldMapping: company__domain: domain outputFieldNamePrefix: dns__predictleads_financing_event
Section titled “predictleads_financing_event”Company funding and investment data.
Includes:
- Funding rounds
- Investment amounts
- Investors
- Dates
Use Cases:
- Timely outreach
- Qualification
- Personalization
Cost: Included when enabled
predictleads_news
Section titled “predictleads_news”Company news articles.
Includes:
- Recent news
- Press releases
- Media mentions
- Publication dates
Use Cases:
- Timely relevance
- Conversation starters
- Context awareness
Cost: Included when enabled
website_summary
Section titled “website_summary”AI-generated website summaries.
Includes:
- Company description
- Products/services
- Value proposition
- Key information
Use Cases:
- Company research
- Context for outreach
- Quick company understanding
Cost: Included when enabled
website_content_by_url
Section titled “website_content_by_url”Specific page content extraction.
Use Cases:
- Verify company information
- Check specific claims
- Research context
Cost: Included when enabled
ordered_value_selector
Section titled “ordered_value_selector”Priority-based value selection.
Purpose: Select values from multiple sources with priority rules.
Use Cases:
- Sender selection in sequences
- Fallback logic
- Multi-source data
Field Mapping
Section titled “Field Mapping”Input Field Mapping
Section titled “Input Field Mapping”Maps pipeline fields to provider inputs:
inputFieldMapping: associatedcompanyid: hubspot_idExplanation:
- Pipeline has
associatedcompanyidfield - Provider needs
hubspot_idfield - Mapping connects them
Output Field Prefix
Section titled “Output Field Prefix”Namespaces provider outputs to avoid collisions:
outputFieldNamePrefix: company__Results:
name → company__namedomain → company__domainindustry → company__industryAlways Include Fields
Section titled “Always Include Fields”Forces provider to fetch specific fields:
alwaysIncludeFields: [firstname, lastname, email]Ensures:
- Fields always present
- Even if not in default set
- Explicit data requirements
Common Patterns
Section titled “Common Patterns”Contact with Company
Section titled “Contact with Company”providers: - identifier: hubspot_contact alwaysIncludeFields: [firstname, lastname, email, associatedcompanyid]
- identifier: hubspot_company inputFieldMapping: associatedcompanyid: hubspot_id outputFieldNamePrefix: company__ alwaysIncludeFields: [name, domain, industry]Contact with LinkedIn
Section titled “Contact with LinkedIn”providers: - identifier: hubspot_contact
- identifier: linkedin_person inputFieldMapping: hs_linkedin_url: linkedin_urlCompany with DNS Validation
Section titled “Company with DNS Validation”providers: - identifier: hubspot_company alwaysIncludeFields: [name, domain]
- identifier: dns_records inputFieldMapping: domain: domain outputFieldNamePrefix: dns__Duplicate Pair Analysis
Section titled “Duplicate Pair Analysis”providers: - identifier: hubspot_company inputFieldMapping: object_1_id: hubspot_id outputFieldNamePrefix: company_1__
- identifier: hubspot_company inputFieldMapping: object_2_id: hubspot_id outputFieldNamePrefix: company_2__Cost Considerations
Section titled “Cost Considerations”Free Providers
Section titled “Free Providers”- hubspot_contact
- hubspot_company
- hubspot_deal
- dns_records
- hubspot_previous_communication
Included with Enrichment
Section titled “Included with Enrichment”- linkedin_person (contact enrichment)
- linkedin_organization (company enrichment)
- website_summary
- website_content_by_url
Additional Cost
Section titled “Additional Cost”- linkedin_person_posts
- linkedin_organization_posts
- predictleads_news (volume-based)
- predictleads_financing_event (volume-based)
Best Practices
Section titled “Best Practices”Be Selective
Section titled “Be Selective”Start Minimal
Section titled “Start Minimal”Initial setup:
- Core HubSpot data only
- Test pipeline
- Add enrichment sources if needed
- Measure impact
Prioritize Free Sources
Section titled “Prioritize Free Sources”Free & valuable:
- HubSpot contact/company
- DNS records
- Previous communication
Use selectively:
- LinkedIn posts (expensive)
- External news (variable value)
Test Impact
Section titled “Test Impact”Before adding source:
- Run pipeline without it
- Note quality/success rate
- Add source
- Re-run pipeline
- Compare results
- Keep only if meaningful improvement
Troubleshooting
Section titled “Troubleshooting”Missing Data
Section titled “Missing Data”Problem: Expected fields not populated
Check:
- Field exists in HubSpot?
- Field mapping correct?
- Provider has access?
- AlwaysIncludeFields set?
Solutions:
- Add to alwaysIncludeFields
- Check field name spelling
- Verify provider configuration
- Review execution logs
Performance Issues
Section titled “Performance Issues”Problem: Pipeline processing slowly
Likely causes:
- Too many data sources
- External API calls slow
- Large data fetching
Solutions:
- Disable unnecessary sources
- Use only required fields
- Limit enrichment sources
- Process smaller batches
Unexpected Costs
Section titled “Unexpected Costs”Problem: Higher costs than expected
Review:
- Which providers enabled?
- LinkedIn posts usage?
- External API calls?
- Processing volume?
Solutions:
- Disable expensive sources
- Use targeted lists
- Process high-value records only
- Monitor Usage tab
Next Steps
Section titled “Next Steps”- Pipeline Kinds — Choose the right pipeline kind
- HubSpot Integration — Configure CRM
- Best Practices — Optimization tips