Skip to content

Data Deduplication Overview

Understand how Sellestial detects and helps you manage duplicate contacts and companies in your HubSpot CRM.

Maintaining clean CRM data is critical for accurate reporting, effective outreach, and operational efficiency. The Data Deduplication feature provides a centralized system for:

  • Detecting duplicates using configurable matching rules
  • Reviewing potential duplicates as Pairs (one-to-one) or Clusters (groups of 2+)
  • Managing deduplication rules for contacts and companies
  • Integrating with pipelines to automate cleanup workflows
  • Enrolling clusters directly into processing pipelines for batch handling
Why this matters

Duplicate records create serious problems:

  • Inaccurate reporting — Metrics and dashboards show inflated numbers
  • Poor customer experience — Multiple outreach to the same person/company
  • Wasted resources — Enrichment and automation credits spent on duplicates
  • Team confusion — Which record is the “real” one?
  • Data decay — Updates to one record while the duplicate remains stale

Two Review Views:

  • Pairs — One-to-one duplicate comparison with rule attribution
  • Clusters — Groups of 2+ matching records for batch processing

Flexible Matching:

  • Six match types: Exact, Fuzzy, Numeric, Domain, Nickname, Phonetic
  • Combine multiple fields with AND logic
  • Separate rule sets for Contacts and Companies

Draft & Publish Workflow:

  • Configure rules without affecting production
  • Preview changes before activation
  • Background processing with status indicators

Pipeline Integration:

  • Use pairs as input sources for automated processing
  • Enroll clusters directly into pipelines
  • Connect detection to cleanup workflows

Pairs View — Review suspected duplicates one-by-one with detailed field comparison and rule attribution. Best for verifying high-confidence matches and setting up automated processing.

Clusters View — Review groups of 2+ records that all match together. Best for batch operations and handling systematic data quality issues (e.g., many companies with the same domain).

Both views let you:

  • See which rule flagged each match
  • Open records directly in HubSpot
  • Enroll into processing pipelines
  • Search and filter results
  1. Configure Rules — Define which fields to match and how (Settings modal)
  2. Publish — Activate your rules and trigger background processing
  3. Detection — System checks all eligible records asynchronously
  4. Results — Matching records appear as Pairs (one-to-one) or Clusters (groups)
  5. Action — Process via pipelines or merge manually in HubSpot
Rule matching logic

Within a rule: ALL conditions must match (AND logic)

Across rules: Matching ANY rule flags a duplicate (OR logic)

Example: A pair appears if it matches Rule #1 OR Rule #2 OR Rule #3

Process accumulated duplicates in your existing database:

When to use Pairs:

  • Review high-confidence matches (Exact on LinkedIn URL, Email, Domain)
  • Verify matches before merging
  • Handle sensitive or complex cases manually

When to use Clusters:

  • Process large groups efficiently (e.g., many companies with same domain)
  • Batch enroll into cleaning/normalization pipelines
  • Handle systematic issues (e.g., all “Freelance” companies)

Keep rules active for ongoing detection:

  • Continuous background checking as new records are created
  • Weekly or monthly review of new Pairs/Clusters
  • Automated pipeline processing with human review

After importing data from external systems:

  • Temporarily add strict rules for import-specific fields
  • Use Clusters view to find groups created by import
  • Enroll into merge pipelines to consolidate
  • Remove temporary rules after cleanup

Fix systemic data quality issues:

  • Use Domain matching to group companies by base domain
  • Use Fuzzy/Phonetic matching to find name variations
  • Enroll clusters into normalization pipelines
  • Let StructuredData pipelines clean and standardize

Deduplication detects duplicates; pipelines process them.

Two integration methods:

1. Pairs as Pipeline Input Source
Configure pipelines to automatically process detected pairs as they’re discovered.

2. Enroll Clusters Directly
Manually enroll specific cluster groups into processing pipelines.

Recommended: Duplicate Resolver

The Duplicate Resolver (Marketplace) is an AI Agent pipeline purpose-built for duplicate resolution. It ingests pairs, researches using web tools, and intelligently merges confirmed duplicates.

Available for Companies now, Contact version coming soon.

See Configure page for setup details.

Use Pairs when:

  • You want to review matches one-by-one
  • Verifying high-confidence duplicates before merging
  • Investigating specific duplicate issues
  • Setting up automated pipeline processing

Use Clusters when:

  • You have groups of 5+ matching records
  • Processing systematic data quality issues
  • Enrolling batches into cleaning pipelines
  • Handling import duplicates efficiently

Switch between views:

  • Use View Clusters button from Pairs page
  • Use View Pairs button from Clusters page
  • Both views respect your Type filter (Company/Contact)

Can I review Company and Contact duplicates at the same time?
No. The Type filter is single-select — choose Company dedup or Contact dedup.

Do changes take effect immediately?
No. Changes to rules are drafts until you click Publish. After publishing, background processing takes time to complete.

How long does background processing take?
Depends on database size: minutes for small databases, up to several hours for 100K+ records.

Can I undo a published rule?
Yes. Edit the Rules list, disable or delete the rule, then Publish again. The system will re-run detection.

What’s the difference between Pairs and Clusters?
Pairs show one-to-one duplicates. Clusters show groups of 2+ records that all match together.

How do I actually merge duplicates?
Deduplication detects duplicates. To merge, either:

  • Install processing pipelines from Marketplace and configure them to use pairs as input
  • Enroll clusters directly into pipelines via Add to pipeline
  • Merge manually in HubSpot using the external links

Can I customize which fields to match on?
Yes. Click + Add Rule in Settings to create custom rules with any HubSpot fields.

Ready to set up deduplication?
Configure Deduplication — Step-by-step setup guide with interface reference

Need processing pipelines?
Template Marketplace — Browse merge and cleaning pipelines

Want to understand the tech?
Pipeline Kinds — Learn about Agent, Code, and StructuredData pipelines