Data Deduplication Overview

Understand how Sellestial detects and helps you manage duplicate contacts and companies in your HubSpot CRM.

What is Data Deduplication?

Maintaining clean CRM data is critical for accurate reporting, effective outreach, and operational efficiency. The Data Deduplication feature provides a centralized system for:

Detecting duplicates using configurable matching rules
Reviewing potential duplicates as Pairs (one-to-one) or Clusters (groups of 2+)
Managing deduplication rules for contacts and companies
Integrating with pipelines to automate cleanup workflows
Enrolling clusters directly into processing pipelines for batch handling

Why Deduplication Matters

Why this matters

Duplicate records create serious problems:

Inaccurate reporting — Metrics and dashboards show inflated numbers
Poor customer experience — Multiple outreach to the same person/company
Wasted resources — Enrichment and automation credits spent on duplicates
Team confusion — Which record is the “real” one?
Data decay — Updates to one record while the duplicate remains stale

Key Capabilities

Two Review Views:

Pairs — One-to-one duplicate comparison with rule attribution
Clusters — Groups of 2+ matching records for batch processing

Flexible Matching:

Six match types: Exact, Fuzzy, Numeric, Domain, Nickname, Phonetic
Combine multiple fields with AND logic
Separate rule sets for Contacts and Companies

Draft & Publish Workflow:

Configure rules without affecting production
Preview changes before activation
Background processing with status indicators

Pipeline Integration:

Use pairs as input sources for automated processing
Enroll clusters directly into pipelines
Connect detection to cleanup workflows

Two Ways to Review Duplicates

Pairs View — Review suspected duplicates one-by-one with detailed field comparison and rule attribution. Best for verifying high-confidence matches and setting up automated processing.

Clusters View — Review groups of 2+ records that all match together. Best for batch operations and handling systematic data quality issues (e.g., many companies with the same domain).

Both views let you:

See which rule flagged each match
Open records directly in HubSpot
Enroll into processing pipelines
Search and filter results

How It Works

Configure Rules — Define which fields to match and how (Settings modal)
Publish — Activate your rules and trigger background processing
Detection — System checks all eligible records asynchronously
Results — Matching records appear as Pairs (one-to-one) or Clusters (groups)
Action — Process via pipelines or merge manually in HubSpot

Rule matching logic

Within a rule: ALL conditions must match (AND logic)

Across rules: Matching ANY rule flags a duplicate (OR logic)

Example: A pair appears if it matches Rule #1 OR Rule #2 OR Rule #3

Common Use Cases

Clean Up Historical Duplicates

Process accumulated duplicates in your existing database:

When to use Pairs:

Review high-confidence matches (Exact on LinkedIn URL, Email, Domain)
Verify matches before merging
Handle sensitive or complex cases manually

When to use Clusters:

Process large groups efficiently (e.g., many companies with same domain)
Batch enroll into cleaning/normalization pipelines
Handle systematic issues (e.g., all “Freelance” companies)

Prevent Future Duplicates

Keep rules active for ongoing detection:

Continuous background checking as new records are created
Weekly or monthly review of new Pairs/Clusters
Automated pipeline processing with human review

Handle Import Duplicates

After importing data from external systems:

Temporarily add strict rules for import-specific fields
Use Clusters view to find groups created by import
Enroll into merge pipelines to consolidate
Remove temporary rules after cleanup

Normalize Messy Data

Fix systemic data quality issues:

Use Domain matching to group companies by base domain
Use Fuzzy/Phonetic matching to find name variations
Enroll clusters into normalization pipelines
Let StructuredData pipelines clean and standardize

Integrating with Pipelines

Deduplication detects duplicates; pipelines process them.

Two integration methods:

1. Pairs as Pipeline Input Source
Configure pipelines to automatically process detected pairs as they’re discovered.

2. Enroll Clusters Directly
Manually enroll specific cluster groups into processing pipelines.

Recommended: Duplicate Resolver

The Duplicate Resolver (Marketplace) is an AI Agent pipeline purpose-built for duplicate resolution. It ingests pairs, researches using web tools, and intelligently merges confirmed duplicates.

Available for Companies now, Contact version coming soon.

See Configure page for setup details.

Best Practices

Pairs vs Clusters: When to Use Each

Use Pairs when:

You want to review matches one-by-one
Verifying high-confidence duplicates before merging
Investigating specific duplicate issues
Setting up automated pipeline processing

Use Clusters when:

You have groups of 5+ matching records
Processing systematic data quality issues
Enrolling batches into cleaning pipelines
Handling import duplicates efficiently

Switch between views:

Use View Clusters button from Pairs page
Use View Pairs button from Clusters page
Both views respect your Type filter (Company/Contact)

FAQ

Can I review Company and Contact duplicates at the same time?
No. The Type filter is single-select — choose Company dedup or Contact dedup.

Do changes take effect immediately?
No. Changes to rules are drafts until you click Publish. After publishing, background processing takes time to complete.

How long does background processing take?
Depends on database size: minutes for small databases, up to several hours for 100K+ records.

Can I undo a published rule?
Yes. Edit the Rules list, disable or delete the rule, then Publish again. The system will re-run detection.

What’s the difference between Pairs and Clusters?
Pairs show one-to-one duplicates. Clusters show groups of 2+ records that all match together.

How do I actually merge duplicates?
Deduplication detects duplicates. To merge, either:

Install processing pipelines from Marketplace and configure them to use pairs as input
Enroll clusters directly into pipelines via Add to pipeline
Merge manually in HubSpot using the external links

Can I customize which fields to match on?
Yes. Click + Add Rule in Settings to create custom rules with any HubSpot fields.

Next Steps

Ready to set up deduplication?
→ Configure Deduplication — Step-by-step setup guide with interface reference

Need processing pipelines?
→ Template Marketplace — Browse merge and cleaning pipelines

Want to understand the tech?
→ Pipeline Kinds — Learn about Agent, Code, and StructuredData pipelines