Data Deduplication Overview
Understand how Sellestial detects and helps you manage duplicate contacts and companies in your HubSpot CRM.
What is Data Deduplication?
Section titled “What is Data Deduplication?”Maintaining clean CRM data is critical for accurate reporting, effective outreach, and operational efficiency. The Data Deduplication feature provides a centralized system for:
- Detecting duplicates using configurable matching rules
- Reviewing potential duplicates as Pairs (one-to-one) or Clusters (groups of 2+)
- Managing deduplication rules for contacts and companies
- Integrating with pipelines to automate cleanup workflows
- Enrolling clusters directly into processing pipelines for batch handling
Why Deduplication Matters
Section titled “Why Deduplication Matters”Duplicate records create serious problems:
- Inaccurate reporting — Metrics and dashboards show inflated numbers
- Poor customer experience — Multiple outreach to the same person/company
- Wasted resources — Enrichment and automation credits spent on duplicates
- Team confusion — Which record is the “real” one?
- Data decay — Updates to one record while the duplicate remains stale
Key Capabilities
Section titled “Key Capabilities”Two Review Views:
- Pairs — One-to-one duplicate comparison with rule attribution
- Clusters — Groups of 2+ matching records for batch processing
Flexible Matching:
- Six match types: Exact, Fuzzy, Numeric, Domain, Nickname, Phonetic
- Combine multiple fields with AND logic
- Separate rule sets for Contacts and Companies
Draft & Publish Workflow:
- Configure rules without affecting production
- Preview changes before activation
- Background processing with status indicators
Pipeline Integration:
- Use pairs as input sources for automated processing
- Enroll clusters directly into pipelines
- Connect detection to cleanup workflows
Two Ways to Review Duplicates
Section titled “Two Ways to Review Duplicates”Pairs View — Review suspected duplicates one-by-one with detailed field comparison and rule attribution. Best for verifying high-confidence matches and setting up automated processing.
Clusters View — Review groups of 2+ records that all match together. Best for batch operations and handling systematic data quality issues (e.g., many companies with the same domain).
Both views let you:
- See which rule flagged each match
- Open records directly in HubSpot
- Enroll into processing pipelines
- Search and filter results
How It Works
Section titled “How It Works”- Configure Rules — Define which fields to match and how (Settings modal)
- Publish — Activate your rules and trigger background processing
- Detection — System checks all eligible records asynchronously
- Results — Matching records appear as Pairs (one-to-one) or Clusters (groups)
- Action — Process via pipelines or merge manually in HubSpot
Within a rule: ALL conditions must match (AND logic)
Across rules: Matching ANY rule flags a duplicate (OR logic)
Example: A pair appears if it matches Rule #1 OR Rule #2 OR Rule #3
Common Use Cases
Section titled “Common Use Cases”Clean Up Historical Duplicates
Section titled “Clean Up Historical Duplicates”Process accumulated duplicates in your existing database:
When to use Pairs:
- Review high-confidence matches (Exact on LinkedIn URL, Email, Domain)
- Verify matches before merging
- Handle sensitive or complex cases manually
When to use Clusters:
- Process large groups efficiently (e.g., many companies with same domain)
- Batch enroll into cleaning/normalization pipelines
- Handle systematic issues (e.g., all “Freelance” companies)
Prevent Future Duplicates
Section titled “Prevent Future Duplicates”Keep rules active for ongoing detection:
- Continuous background checking as new records are created
- Weekly or monthly review of new Pairs/Clusters
- Automated pipeline processing with human review
Handle Import Duplicates
Section titled “Handle Import Duplicates”After importing data from external systems:
- Temporarily add strict rules for import-specific fields
- Use Clusters view to find groups created by import
- Enroll into merge pipelines to consolidate
- Remove temporary rules after cleanup
Normalize Messy Data
Section titled “Normalize Messy Data”Fix systemic data quality issues:
- Use Domain matching to group companies by base domain
- Use Fuzzy/Phonetic matching to find name variations
- Enroll clusters into normalization pipelines
- Let StructuredData pipelines clean and standardize
Integrating with Pipelines
Section titled “Integrating with Pipelines”Deduplication detects duplicates; pipelines process them.
Two integration methods:
1. Pairs as Pipeline Input Source
Configure pipelines to automatically process detected pairs as they’re discovered.
2. Enroll Clusters Directly
Manually enroll specific cluster groups into processing pipelines.
The Duplicate Resolver (Marketplace) is an AI Agent pipeline purpose-built for duplicate resolution. It ingests pairs, researches using web tools, and intelligently merges confirmed duplicates.
Available for Companies now, Contact version coming soon.
See Configure page for setup details.
Best Practices
Section titled “Best Practices”Pairs vs Clusters: When to Use Each
Section titled “Pairs vs Clusters: When to Use Each”Use Pairs when:
- You want to review matches one-by-one
- Verifying high-confidence duplicates before merging
- Investigating specific duplicate issues
- Setting up automated pipeline processing
Use Clusters when:
- You have groups of 5+ matching records
- Processing systematic data quality issues
- Enrolling batches into cleaning pipelines
- Handling import duplicates efficiently
Switch between views:
- Use View Clusters button from Pairs page
- Use View Pairs button from Clusters page
- Both views respect your Type filter (Company/Contact)
Can I review Company and Contact duplicates at the same time?
No. The Type filter is single-select — choose Company dedup or Contact dedup.
Do changes take effect immediately?
No. Changes to rules are drafts until you click Publish. After publishing, background processing takes time to complete.
How long does background processing take?
Depends on database size: minutes for small databases, up to several hours for 100K+ records.
Can I undo a published rule?
Yes. Edit the Rules list, disable or delete the rule, then Publish again. The system will re-run detection.
What’s the difference between Pairs and Clusters?
Pairs show one-to-one duplicates. Clusters show groups of 2+ records that all match together.
How do I actually merge duplicates?
Deduplication detects duplicates. To merge, either:
- Install processing pipelines from Marketplace and configure them to use pairs as input
- Enroll clusters directly into pipelines via Add to pipeline
- Merge manually in HubSpot using the external links
Can I customize which fields to match on?
Yes. Click + Add Rule in Settings to create custom rules with any HubSpot fields.
Next Steps
Section titled “Next Steps”Ready to set up deduplication?
→ Configure Deduplication — Step-by-step setup guide with interface reference
Need processing pipelines?
→ Template Marketplace — Browse merge and cleaning pipelines
Want to understand the tech?
→ Pipeline Kinds — Learn about Agent, Code, and StructuredData pipelines