Configure Deduplication

Step-by-step guide to setting up and using the Deduplication feature, including interface reference and configuration details.

Getting Started

Navigate to Deduplication

Click Deduplication in the left sidebar to access the Pairs view.

You’ll see the Potential Duplicate Pairs interface with filters, search, and a Settings button.
Open Settings

Click the Settings button (top right) to configure matching rules.

The Settings modal shows:
- Company dedup and Contact dedup sections (collapsible)
- Active toggle to enable deduplication for each object type
- Active Rules block (currently running rules)
- Rules list (your draft rules being edited)
- + Add Rule button to create new rules
- Publish button to activate changes
- Object count showing scope (e.g., “114,214 objects are ready to be checked”)
Review Default Rules

Both Company and Contact sections include proven default rules optimized for common scenarios. You can use these as-is or customize them.
Enable and Publish
- Toggle Active for Company and/or Contact deduplication
- Review your draft rules (the defaults are a good starting point)
- Click Publish to activate
Confirm the publish action. This creates a new search index and applies your rules.
Wait for Background Processing

Watch for status indicators:
- “New rules being applied…” (in Settings)
- “Deduplication check is running in the background…” (on Pairs/Clusters pages)
Processing time varies: minutes for small databases, hours for 100K+ records.
Review Results

After processing completes:
- Browse Pairs for one-to-one comparisons
- Switch to Clusters (via View Clusters button) for groups
- Click eye icons to see Rule Details
- Use external links to open records in HubSpot
Take Action

Process duplicates using pipelines or manual merging:
- Recommended: Install Duplicate Resolver from Marketplace (AI Agent that validates and merges company duplicates)
- Configure pipeline to use “Deduplication type source” to ingest pairs automatically
- Enable “Require Human Review” for merge confirmation
- Or enroll specific clusters via Add to pipeline button
- Or merge manually in HubSpot

Recommended: Use Duplicate Resolver

The Duplicate Resolver pipeline (available in Marketplace) is specifically designed to work with deduplication:

AI Agent with web research validates if pairs are true duplicates
Intelligently selects primary record based on data quality
Safely merges while preserving associations
Available for Companies (Contact version coming soon)

See Pipeline Integration section below for details.

Interface Reference

Potential Duplicate Pairs

URL: /deduplication/candidates

Review one-to-one suspected duplicates:

Table Columns:

Found — Timestamp when the pair was flagged
Type — Contact dedup or Company dedup
Object A / Object B — Details of each record in the pair
- Each includes an external-link icon to open the HubSpot CRM record
- For contacts: First name, Last name, Company name
- For companies: Company name, domain
Rule — Which rule matched (click the eye icon to open Rule Details modal)

Filters & Controls:

Type — Select exactly one: Company dedup or Contact dedup
Latest rules only toggle — Show only pairs from current published rules, or include historical
Search — Free-text search across all pair data
Settings button — Opens Deduplication Settings

Rule Details Modal:

Click the eye icon next to any rule to see exactly which fields matched:

Field	Match Type
LinkedIn URL (`hs_linkedin_url`)	Exact

Shows field names, HubSpot property names, and the match type used.

Potential Duplicate Clusters

URL: /deduplication/clusters

Review groups of 2+ records that match together:

Table Columns:

Created — Timestamp when cluster was generated
Type — Company dedup or Contact dedup
Size — Number of records in the cluster (badge)
Objects — Preview of member names + “View all X members” link
Rule — The rule that produced the cluster
Actions → Add to pipeline — Enroll cluster into a processing pipeline

Filters & Controls:

Type — Select exactly one: Company dedup or Contact dedup
Size — Filter by minimum cluster size (2+, 3+, 4+, 5+)
View Pairs button — Jump to the Pairs view for the same object type
Settings button — Opens Deduplication Settings

All Cluster Members Dialog:

Click “View all X members” to see the complete list of records in the cluster.

Enroll Cluster into Pipeline:

Click Add to pipeline to open a picker showing available processing pipelines. Select a pipeline and click Enroll to queue the cluster for processing.

Deduplication Settings Reference

The Settings modal (accessed via Settings button on Pairs or Clusters pages) provides complete control over duplicate detection rules.

Layout Structure

Two Object Type Sections:

Company dedup (collapsible)
Contact dedup (collapsible)

Each section contains:

Element	Description
Active toggle	Enable/disable deduplication for this object type (shown on right)
Active Rules block	Currently published and running rules. Shows “No active rules found” until you publish.
Rules list	Draft rules you’re editing. Shows count like “Rules (3)”. Editable but not active until published.
+ Add Rule button	Create new matching rules
Publish button	Apply draft rules and trigger background processing
Object count	Example: “114,214 objects are ready to be checked for duplicates”

Active Rules vs Draft Rules

Critical distinction

Active Rules block = Currently running in production

Rules list = Your working draft (not active)

Message: “Rules will be activated when published.”

Click Publish to promote drafts to active.

Publishing Workflow

When you click Publish, a confirmation dialog appears asking:

“Are you sure you want to publish these rules? This will create a new search index and use the current rules for duplicate matching. The process may take some time to complete.”

Actions:

Cancel — Discard and return to editing
Publish — Confirm and start background processing

After publishing:

“New rules being applied…” appears in Settings
Background process checks all eligible objects
Pairs/Clusters update as results are computed

Match Types Reference

Exact

Values must be identical:

Case-sensitive comparison
No typos or variations allowed
Best for: Domains, URLs, email addresses, text IDs
Example: acme.com matches acme.com but NOT Acme.com

Fuzzy

Tolerant of minor spelling differences:

Handles typos and variations
Similarity threshold applied
Best for: Names, company names, text fields
Example: Acme Corp matches ACME Corporation

Numeric

Numeric equality comparison:

Compares numeric values
Best for: HubSpot IDs, employee counts, any numeric identifiers
Example: 12345 matches 12345

Domain

Second-level domain matching:

Ignores subdomains and protocols
Groups by base domain (e.g., acme.com)
Best for: Company website URLs, email domains
Example: www.acme.com matches blog.acme.com (both are acme.com)

Nickname

Resolves common nickname variations:

Matches nicknames to formal names
Best for: Contact first names
Example: Bill matches William, Bob matches Robert, Liz matches Elizabeth

Phonetic

Matches phonetically similar strings:

Uses Soundex-like phonetic algorithms
Best for: Names with variant spellings
Example: Smith matches Smythe, Catherine matches Katherine

Combining match types

A rule can have multiple conditions with different match types. For a pair/cluster to match the rule, ALL conditions must match (AND logic).

Example: Full name (Fuzzy) AND Company ID (Numeric) means both the name must be similar AND the company ID must be exactly equal.

Default Rules Reference

When you first open Settings, you’ll see these proven default rules:

Company Deduplication

Rule	Field	Match Type	Description
Rule #1	Domain (`domain`)	Domain	Groups by second-level domain (www.acme.com = blog.acme.com)
Rule #2	LinkedIn company page (`linkedin_company_page`)	Exact	Authoritative identifier
Rule #3	Name (`name`)	Exact	Identical company names

Contact Deduplication

Rule	Fields	Match Types	Description
Rule #1	Full name or email + Company ID	Fuzzy + Numeric	Same person at same company (high precision)
Rule #2	Full name or email + Company name	Fuzzy + Fuzzy	Same person at same company (by name)
Rule #3	Full name or email + Associated company name	Fuzzy + Fuzzy	With company association
Rule #4	LinkedIn URL (`hs_linkedin_url`)	Exact	Highest confidence signal

Why these defaults

Company: Rule #1 uses Domain matching to catch subdomain variations. Rules #2-3 use Exact for high precision.

Contact: Rules #1-3 combine Fuzzy name matching with company identifiers for precision. Rule #4 is the highest confidence signal.

Rule Strategy Guide

For Companies

Start with (high confidence):

Domain match on domain field — catches subdomain variations
Exact on linkedin_company_page — authoritative identifier
Exact on name — identical company names

Add if needed (broader recall):

Fuzzy on name for typo variations
Phonetic on name for spelling variants
Numeric on linkedin_numeric_id if you have LinkedIn data

For Contacts

Start with (high confidence):

Exact on hs_linkedin_url — unique personal identifier
Exact on email — very reliable
Fuzzy on hs_full_name_or_email + Numeric on associatedcompanyid — same person at company

Add if needed (broader recall):

Nickname on first name fields
Phonetic on name fields for variants
Fuzzy combinations with company name fields

Building Effective Rules

High precision (fewer false positives):

Use Exact or Numeric match types
Combine multiple fields with AND logic
Match on unique identifiers

Broader recall (find more duplicates):

Add Fuzzy, Domain, Nickname, or Phonetic
Use fewer field combinations
Match on common fields

Balance both:

Start strict, add broader rules gradually
Review results after each publish
Disable rules that generate too many false positives

Match Type Selection Guide

By Confidence Level

High Confidence (Exact, Numeric):

Safe for automated processing
Minimal false positives
Best for: LinkedIn URLs, emails, domains, IDs

Medium Confidence (Domain, Fuzzy + constraints):

Good for manual review
Some false positives expected
Best for: Domains with subdomains, names with company context

Lower Confidence (Fuzzy, Nickname, Phonetic alone):

Broader recall, more false positives
Requires careful review
Best for: Discovery, then filtering

By Field Type

Unique Identifiers:

LinkedIn URLs → Exact
Email addresses → Exact
HubSpot IDs → Numeric

Domain Fields:

Company websites → Domain (groups subdomains)
Email domains → Domain or Exact

Name Fields:

Company names → Exact or Fuzzy
Contact names → Fuzzy + other constraints
First names → Nickname (with constraints)

Text Fields:

Short text → Exact or Fuzzy
Addresses → Fuzzy
Multi-word → Phonetic (for variants)

Troubleshooting

Too Many False Positives

Solutions:

Switch from Fuzzy/Phonetic to Exact or Domain matching
Add more fields to matching criteria (AND logic increases precision)
Disable overly broad single-field Fuzzy rules
Use Numeric matching on IDs for stricter comparison
Publish changes and wait for new results

Missing Duplicates

Solutions:

Add Fuzzy matching to handle typo variations
Try Domain matching instead of Exact for website fields
Add Nickname matching for contact first names
Add Phonetic matching for names with variant spellings
Verify that fields have data in both records
Ensure rules are published and Active toggle is on

No Pairs/Clusters Appearing

Check:

Rules are published (not just saved)
Active toggle is enabled for object type
Background processing completed (“New rules being applied…” gone)
“Latest rules only” isn’t filtering out results
Fields in rules actually have data in HubSpot

”View Pairs” or “View Clusters” Shows Empty

Possible causes:

No matches for the current filter settings
Rules are too strict (all Exact on rare fields)
Background processing still running
Object type filter mismatch

Solutions:

Clear filters and try again
Check “Latest rules only” toggle
Add broader match types to rules
Wait for background processing to complete

Pipeline Integration

Duplicate Resolver (Recommended)

The Duplicate Resolver is an AI Agent pipeline from the Marketplace specifically designed to process duplicates detected by this feature.

How it works with Deduplication:

Deduplication rules detect potential duplicates → create pairs
Duplicate Resolver ingests pairs via “Deduplication type source”
Agent researches each pair using web tools (Google, websites, LinkedIn)
Agent classifies: CONFIRMED DUPLICATE, NOT DUPLICATE, or NEEDS HUMAN REVIEW
For confirmed duplicates, intelligently merges into primary record

Key capabilities:

External verification (doesn’t rely solely on CRM data)
Intelligent primary record selection (based on data completeness and reliability)
Safe merging with association preservation
Manual entry prioritization over enrichment data
Large merge safety (>30 associations require review)

Availability:

✅ Company Duplicate Resolver — Available now
⏳ Contact Duplicate Resolver — Coming soon

Setup:

Browse Marketplace → Install “Duplicate Resolver”
In pipeline Settings: Set input source to “Deduplication type source”
Select object type: Company
Enable “Require Human Review”
Deploy pipeline
Agent processes pairs automatically as deduplication detects them

Other Processing Options

Using Any Pipeline with Pairs:

Configure pipeline input source: “Deduplication type source”
Select specific rules or “All rules”
Pipeline processes pairs continuously

Enrolling Clusters:

Click Add to pipeline on any cluster
Choose from available pipelines
Enroll entire group at once

Pipeline Types:

Agent — Research-backed decisions (Duplicate Resolver)
Code — Deterministic logic
StructuredData — Normalization and cleaning

System Behavior

After Publishing Rules:

Status indicators appear:

“New rules being applied…” (in Settings)
“Deduplication check is running in the background…” (on Pairs/Clusters pages)

Processing time:

Small databases (< 10K): Minutes
Medium databases (10K-100K): 30 minutes to 2 hours
Large databases (100K+): Several hours

Continuous detection:

New records checked automatically
Existing records re-evaluated when rules change
No impact on HubSpot performance

Next Steps

Need conceptual background?
→ Data Deduplication Overview — Understand why and when to use deduplication

Ready to process duplicates?
→ Template Marketplace — Find merge and cleaning pipelines

Want deeper pipeline knowledge?
→ Pipeline Kinds — Learn about Agent, Code, and StructuredData capabilities