Skip to content

Configure Deduplication

Step-by-step guide to setting up and using the Deduplication feature, including interface reference and configuration details.

  1. Navigate to Deduplication

    Click Deduplication in the left sidebar to access the Pairs view.

    Potential Duplicate Pairs

    You’ll see the Potential Duplicate Pairs interface with filters, search, and a Settings button.

  2. Open Settings

    Click the Settings button (top right) to configure matching rules.

    Deduplication Settings

    The Settings modal shows:

    • Company dedup and Contact dedup sections (collapsible)
    • Active toggle to enable deduplication for each object type
    • Active Rules block (currently running rules)
    • Rules list (your draft rules being edited)
    • + Add Rule button to create new rules
    • Publish button to activate changes
    • Object count showing scope (e.g., “114,214 objects are ready to be checked”)
  3. Review Default Rules

    Both Company and Contact sections include proven default rules optimized for common scenarios. You can use these as-is or customize them.

  4. Enable and Publish

    • Toggle Active for Company and/or Contact deduplication
    • Review your draft rules (the defaults are a good starting point)
    • Click Publish to activate

    Publish Rules Confirmation

    Confirm the publish action. This creates a new search index and applies your rules.

  5. Wait for Background Processing

    Watch for status indicators:

    • “New rules being applied…” (in Settings)
    • “Deduplication check is running in the background…” (on Pairs/Clusters pages)

    Processing time varies: minutes for small databases, hours for 100K+ records.

  6. Review Results

    After processing completes:

    • Browse Pairs for one-to-one comparisons
    • Switch to Clusters (via View Clusters button) for groups
    • Click eye icons to see Rule Details
    • Use external links to open records in HubSpot
  7. Take Action

    Process duplicates using pipelines or manual merging:

    • Recommended: Install Duplicate Resolver from Marketplace (AI Agent that validates and merges company duplicates)
    • Configure pipeline to use “Deduplication type source” to ingest pairs automatically
    • Enable “Require Human Review” for merge confirmation
    • Or enroll specific clusters via Add to pipeline button
    • Or merge manually in HubSpot
Recommended: Use Duplicate Resolver

The Duplicate Resolver pipeline (available in Marketplace) is specifically designed to work with deduplication:

  • AI Agent with web research validates if pairs are true duplicates
  • Intelligently selects primary record based on data quality
  • Safely merges while preserving associations
  • Available for Companies (Contact version coming soon)

See Pipeline Integration section below for details.

URL: /deduplication/candidates

Review one-to-one suspected duplicates:

Table Columns:

  • Found — Timestamp when the pair was flagged
  • Type — Contact dedup or Company dedup
  • Object A / Object B — Details of each record in the pair
    • Each includes an external-link icon to open the HubSpot CRM record
    • For contacts: First name, Last name, Company name
    • For companies: Company name, domain
  • Rule — Which rule matched (click the eye icon to open Rule Details modal)

Filters & Controls:

  • Type — Select exactly one: Company dedup or Contact dedup
  • Latest rules only toggle — Show only pairs from current published rules, or include historical
  • Search — Free-text search across all pair data
  • Settings button — Opens Deduplication Settings

Rule Details Modal:

Click the eye icon next to any rule to see exactly which fields matched:

FieldMatch Type
LinkedIn URL (hs_linkedin_url)Exact

Shows field names, HubSpot property names, and the match type used.

URL: /deduplication/clusters

Review groups of 2+ records that match together:

Table Columns:

  • Created — Timestamp when cluster was generated
  • Type — Company dedup or Contact dedup
  • Size — Number of records in the cluster (badge)
  • Objects — Preview of member names + “View all X members” link
  • Rule — The rule that produced the cluster
  • ActionsAdd to pipeline — Enroll cluster into a processing pipeline

Filters & Controls:

  • Type — Select exactly one: Company dedup or Contact dedup
  • Size — Filter by minimum cluster size (2+, 3+, 4+, 5+)
  • View Pairs button — Jump to the Pairs view for the same object type
  • Settings button — Opens Deduplication Settings

All Cluster Members Dialog:

Click “View all X members” to see the complete list of records in the cluster.

Enroll Cluster into Pipeline:

Click Add to pipeline to open a picker showing available processing pipelines. Select a pipeline and click Enroll to queue the cluster for processing.

The Settings modal (accessed via Settings button on Pairs or Clusters pages) provides complete control over duplicate detection rules.

Two Object Type Sections:

  • Company dedup (collapsible)
  • Contact dedup (collapsible)

Each section contains:

ElementDescription
Active toggleEnable/disable deduplication for this object type (shown on right)
Active Rules blockCurrently published and running rules. Shows “No active rules found” until you publish.
Rules listDraft rules you’re editing. Shows count like “Rules (3)”. Editable but not active until published.
+ Add Rule buttonCreate new matching rules
Publish buttonApply draft rules and trigger background processing
Object countExample: “114,214 objects are ready to be checked for duplicates”
Critical distinction

Active Rules block = Currently running in production

Rules list = Your working draft (not active)

Message: “Rules will be activated when published.”

Click Publish to promote drafts to active.

When you click Publish, a confirmation dialog appears asking:

“Are you sure you want to publish these rules? This will create a new search index and use the current rules for duplicate matching. The process may take some time to complete.”

Actions:

  • Cancel — Discard and return to editing
  • Publish — Confirm and start background processing

After publishing:

  • “New rules being applied…” appears in Settings
  • Background process checks all eligible objects
  • Pairs/Clusters update as results are computed

Values must be identical:

  • Case-sensitive comparison
  • No typos or variations allowed
  • Best for: Domains, URLs, email addresses, text IDs
  • Example: acme.com matches acme.com but NOT Acme.com

Tolerant of minor spelling differences:

  • Handles typos and variations
  • Similarity threshold applied
  • Best for: Names, company names, text fields
  • Example: Acme Corp matches ACME Corporation

Numeric equality comparison:

  • Compares numeric values
  • Best for: HubSpot IDs, employee counts, any numeric identifiers
  • Example: 12345 matches 12345

Second-level domain matching:

  • Ignores subdomains and protocols
  • Groups by base domain (e.g., acme.com)
  • Best for: Company website URLs, email domains
  • Example: www.acme.com matches blog.acme.com (both are acme.com)

Resolves common nickname variations:

  • Matches nicknames to formal names
  • Best for: Contact first names
  • Example: Bill matches William, Bob matches Robert, Liz matches Elizabeth

Matches phonetically similar strings:

  • Uses Soundex-like phonetic algorithms
  • Best for: Names with variant spellings
  • Example: Smith matches Smythe, Catherine matches Katherine
Combining match types

A rule can have multiple conditions with different match types. For a pair/cluster to match the rule, ALL conditions must match (AND logic).

Example: Full name (Fuzzy) AND Company ID (Numeric) means both the name must be similar AND the company ID must be exactly equal.

When you first open Settings, you’ll see these proven default rules:

RuleFieldMatch TypeDescription
Rule #1Domain (domain)DomainGroups by second-level domain (www.acme.com = blog.acme.com)
Rule #2LinkedIn company page (linkedin_company_page)ExactAuthoritative identifier
Rule #3Name (name)ExactIdentical company names
RuleFieldsMatch TypesDescription
Rule #1Full name or email + Company IDFuzzy + NumericSame person at same company (high precision)
Rule #2Full name or email + Company nameFuzzy + FuzzySame person at same company (by name)
Rule #3Full name or email + Associated company nameFuzzy + FuzzyWith company association
Rule #4LinkedIn URL (hs_linkedin_url)ExactHighest confidence signal
Why these defaults

Company: Rule #1 uses Domain matching to catch subdomain variations. Rules #2-3 use Exact for high precision.

Contact: Rules #1-3 combine Fuzzy name matching with company identifiers for precision. Rule #4 is the highest confidence signal.

Start with (high confidence):

  1. Domain match on domain field — catches subdomain variations
  2. Exact on linkedin_company_page — authoritative identifier
  3. Exact on name — identical company names

Add if needed (broader recall):

  • Fuzzy on name for typo variations
  • Phonetic on name for spelling variants
  • Numeric on linkedin_numeric_id if you have LinkedIn data

Start with (high confidence):

  1. Exact on hs_linkedin_url — unique personal identifier
  2. Exact on email — very reliable
  3. Fuzzy on hs_full_name_or_email + Numeric on associatedcompanyid — same person at company

Add if needed (broader recall):

  • Nickname on first name fields
  • Phonetic on name fields for variants
  • Fuzzy combinations with company name fields

High precision (fewer false positives):

  • Use Exact or Numeric match types
  • Combine multiple fields with AND logic
  • Match on unique identifiers

Broader recall (find more duplicates):

  • Add Fuzzy, Domain, Nickname, or Phonetic
  • Use fewer field combinations
  • Match on common fields

Balance both:

  • Start strict, add broader rules gradually
  • Review results after each publish
  • Disable rules that generate too many false positives

High Confidence (Exact, Numeric):

  • Safe for automated processing
  • Minimal false positives
  • Best for: LinkedIn URLs, emails, domains, IDs

Medium Confidence (Domain, Fuzzy + constraints):

  • Good for manual review
  • Some false positives expected
  • Best for: Domains with subdomains, names with company context

Lower Confidence (Fuzzy, Nickname, Phonetic alone):

  • Broader recall, more false positives
  • Requires careful review
  • Best for: Discovery, then filtering

Unique Identifiers:

  • LinkedIn URLs → Exact
  • Email addresses → Exact
  • HubSpot IDs → Numeric

Domain Fields:

  • Company websites → Domain (groups subdomains)
  • Email domains → Domain or Exact

Name Fields:

  • Company names → Exact or Fuzzy
  • Contact names → Fuzzy + other constraints
  • First names → Nickname (with constraints)

Text Fields:

  • Short text → Exact or Fuzzy
  • Addresses → Fuzzy
  • Multi-word → Phonetic (for variants)

Solutions:

  • Switch from Fuzzy/Phonetic to Exact or Domain matching
  • Add more fields to matching criteria (AND logic increases precision)
  • Disable overly broad single-field Fuzzy rules
  • Use Numeric matching on IDs for stricter comparison
  • Publish changes and wait for new results

Solutions:

  • Add Fuzzy matching to handle typo variations
  • Try Domain matching instead of Exact for website fields
  • Add Nickname matching for contact first names
  • Add Phonetic matching for names with variant spellings
  • Verify that fields have data in both records
  • Ensure rules are published and Active toggle is on

Check:

  • Rules are published (not just saved)
  • Active toggle is enabled for object type
  • Background processing completed (“New rules being applied…” gone)
  • “Latest rules only” isn’t filtering out results
  • Fields in rules actually have data in HubSpot

”View Pairs” or “View Clusters” Shows Empty

Section titled “”View Pairs” or “View Clusters” Shows Empty”

Possible causes:

  • No matches for the current filter settings
  • Rules are too strict (all Exact on rare fields)
  • Background processing still running
  • Object type filter mismatch

Solutions:

  • Clear filters and try again
  • Check “Latest rules only” toggle
  • Add broader match types to rules
  • Wait for background processing to complete

The Duplicate Resolver is an AI Agent pipeline from the Marketplace specifically designed to process duplicates detected by this feature.

How it works with Deduplication:

  1. Deduplication rules detect potential duplicates → create pairs
  2. Duplicate Resolver ingests pairs via “Deduplication type source”
  3. Agent researches each pair using web tools (Google, websites, LinkedIn)
  4. Agent classifies: CONFIRMED DUPLICATE, NOT DUPLICATE, or NEEDS HUMAN REVIEW
  5. For confirmed duplicates, intelligently merges into primary record

Key capabilities:

  • External verification (doesn’t rely solely on CRM data)
  • Intelligent primary record selection (based on data completeness and reliability)
  • Safe merging with association preservation
  • Manual entry prioritization over enrichment data
  • Large merge safety (>30 associations require review)

Availability:

  • Company Duplicate Resolver — Available now
  • Contact Duplicate Resolver — Coming soon

Setup:

  1. Browse Marketplace → Install “Duplicate Resolver”
  2. In pipeline Settings: Set input source to “Deduplication type source”
  3. Select object type: Company
  4. Enable “Require Human Review”
  5. Deploy pipeline
  6. Agent processes pairs automatically as deduplication detects them

Using Any Pipeline with Pairs:

  • Configure pipeline input source: “Deduplication type source”
  • Select specific rules or “All rules”
  • Pipeline processes pairs continuously

Enrolling Clusters:

  • Click Add to pipeline on any cluster
  • Choose from available pipelines
  • Enroll entire group at once

Pipeline Types:

  • Agent — Research-backed decisions (Duplicate Resolver)
  • Code — Deterministic logic
  • StructuredData — Normalization and cleaning

After Publishing Rules:

Status indicators appear:

  • “New rules being applied…” (in Settings)
  • “Deduplication check is running in the background…” (on Pairs/Clusters pages)

Processing time:

  • Small databases (< 10K): Minutes
  • Medium databases (10K-100K): 30 minutes to 2 hours
  • Large databases (100K+): Several hours

Continuous detection:

  • New records checked automatically
  • Existing records re-evaluated when rules change
  • No impact on HubSpot performance

Need conceptual background?
Data Deduplication Overview — Understand why and when to use deduplication

Ready to process duplicates?
Template Marketplace — Find merge and cleaning pipelines

Want deeper pipeline knowledge?
Pipeline Kinds — Learn about Agent, Code, and StructuredData capabilities