Skip to content

Pipeline Kinds

Learn about Sellestial’s five pipeline kinds and choose the right one for your use case.

Sellestial offers five pipeline kinds, each optimized for different tasks:

  • Code — Deterministic logic without AI
  • Classifier — AI categorization with confidence scores
  • StructuredData — LLM-powered structured data extraction with predefined schema
  • Agent — Autonomous AI with tools and research
  • Sequence — Multi-step outreach content generation
Choosing the right pipeline kind

🎯 Need outreach content? → Use Sequence

🔍 Need autonomous research or web verification? → Use Agent

📊 Need to extract structured data with a predefined schema? → Use StructuredData

🏷️ Need to assign discrete categories? → Use Classifier

⚡ Need deterministic logic or calculations? → Use Code

Pure Python logic without AI generation.

Processing: Deterministic code execution
Speed: Fastest
Cost: Lowest
Reliability: Highest (no AI variability)

When to use Code
  • ✅ Logic is deterministic and rule-based
  • ✅ No AI judgment needed
  • ✅ Speed and cost are priorities
  • ✅ Exact, repeatable results required

Email Validation:

  • Extract domain from email
  • DNS MX lookup
  • Return status based on records
  • No AI needed

Data Transformation:

  • Parse and reformat fields
  • Extract data from strings
  • Calculate derived values
  • API calls with predictable responses

Validation:

  • Check field formats
  • Verify required fields present
  • Validate against rules
  • Flag anomalies

Can Do:

  • Execute Python code
  • Call functions and APIs
  • Access data sources
  • Transform data
  • Make deterministic decisions

Cannot Do:

  • Generate natural language
  • Make subjective judgments
  • Learn or adapt
  • Handle ambiguity
kind: Code
spec:
code:
dataFields: [email, hs_object_id]
onGenerate:
code: |
# Your Python code here
domain = extract_domain(email)
mx_records = functions.lookup_dns_record(domain, "MX")
return {"status": "DOMAIN_OK" if mx_records else "NO_MX"}
  • High-volume processing
  • Cost-sensitive operations
  • Real-time requirements
  • Deterministic workflows

AI assigns discrete categories with justification.

Processing: LLM categorization
Speed: Fast
Cost: Low to moderate (typically 0.1-0.5 credits per record)
Output: Category + justification (≤25 words)

When to use Classifier
  • ✅ Need to assign categories
  • ✅ Want confidence scores
  • ✅ Categories are predefined
  • ✅ Need justification for decisions

Contact Quality Classification:

  • Analyzes contact data
  • Classifies as: LIKELY_REAL_PERSON, INVALID, MADE_UP, etc.
  • Provides justification
  • Handles edge cases with AI judgment

Lead Quality Scoring:

  • Reviews firmographic data
  • Assigns: HIGH_QUALITY, MEDIUM_QUALITY, LOW_QUALITY
  • Scores confidence
  • Justifies classification

Data Quality Assessment:

  • Evaluates field completeness
  • Categorizes: COMPLETE, PARTIAL, POOR, MISSING
  • Confidence level
  • Lists issues found

Can Do:

  • Classify into predefined categories
  • Provide confidence scores
  • Generate justifications
  • Handle ambiguous data
  • Apply nuanced judgment

Cannot Do:

  • Create new categories dynamically
  • Perform web research
  • Generate long-form content
  • Use external tools
kind: Classifier
spec:
classifier:
llmModel: openai/o4-mini
categories:
- name: HIGH_QUALITY
description: "Complete profile with business email"
- name: LOW_QUALITY
description: "Missing critical information"
promptTemplate: |
Contact: {{ firstname }} {{ lastname }}
Email: {{ email }}
Company: {{ company__name }}
systemPromptTemplate: |
Classify the lead quality. Return JSON:
{"classification": "<CATEGORY>", "justification": "<reason>"}
  • Data quality assessment
  • Lead scoring
  • Segmentation
  • Content categorization

LLM extracts structured information with predefined schema.

Processing: LLM extraction with schema validation
Speed: Fast
Cost: Low to moderate (typically 0.1-0.5 credits per record)
Output: Structured JSON matching predefined schema

When to use StructuredData
  • ✅ Need to extract specific fields from unstructured data
  • ✅ Output schema is predefined and consistent
  • ✅ Working with text, websites, or documents
  • ✅ Need structured output for CRM field population

Company Information Extraction:

  • Extract company name, industry, size, location from website
  • Populate HubSpot company properties
  • Validates against predefined schema
  • Returns NOT_FOUND or INSUFFICIENT_INFORMATION sentinels when data unavailable

Contact Data Parsing:

  • Extract name, title, email, phone from text
  • Structure messy contact information
  • Normalize to consistent format
  • Populate contact fields

Job Posting Analysis:

  • Extract job title, requirements, salary range
  • Structure job description data
  • Categorize by seniority and function
  • Populate custom fields

Can Do:

  • Extract specific fields from unstructured text
  • Validate against predefined schema
  • Handle missing data with sentinel values
  • Structure data for CRM field population
  • Process websites, documents, text

Cannot Do:

  • Make subjective judgments (use Classifier)
  • Perform web research (use Agent)
  • Generate long-form content (use Sequence)
  • Create dynamic schemas
kind: StructuredData
spec:
structuredData:
llmModel: openai/gpt-4.1
promptTemplate: |
Extract company information from the website content below.
Website: {{ website_summary }}
systemPromptTemplate: |
Extract structured company data. Return JSON with exact schema.
Use "NOT_FOUND" when data is missing.
outputFormat:
name: CompanyInfo
description: Structured company information
fields:
company_name:
type: string
description: Official company name
nullable: true
industry:
type: string
description: Primary industry
nullable: true
employee_count:
type: integer
description: Number of employees
nullable: true
headquarters_location:
type: string
description: HQ city and country
nullable: true

Returns JSON matching the defined schema:

{
"company_name": "Acme Corporation",
"industry": "Software & Technology",
"employee_count": 500,
"headquarters_location": "San Francisco, USA"
}

Or with missing data:

{
"company_name": "Acme Corporation",
"industry": "NOT_FOUND",
"employee_count": null,
"headquarters_location": "INSUFFICIENT_INFORMATION"
}
  • openai/gpt-5
  • openai/gpt-oss-120b
  • openai/o1
  • openai/o4-mini
  • openai/o3
  • openai/gpt-4.1
  • openai/gpt-4.1-mini
  • openai/gpt-4.1-nano
  • gemini/gemini-2.5-pro
  • gemini/gemini-2.0-flash
  • gemini/gemini-2.5-flash
  • Data extraction from unstructured sources
  • CRM field population
  • Website scraping with structure
  • Document parsing
  • Contact data normalization

Autonomous AI with tools and research capabilities.

Processing: Multi-step AI with tool use
Speed: Slower (research-intensive)
Cost: Higher (1-5 credits typical)
Capabilities: Web search, website visits, HubSpot analysis

When to use Agent
  • ✅ Need web research
  • ✅ Complex multi-step reasoning required
  • ✅ Decision requires external verification
  • ✅ High-value, low-volume operations

Duplicate Analysis:

  • Fetches HubSpot property history
  • Visits both company websites
  • Searches for additional information
  • Makes merge decision with confidence
  • Preserves best data from both records

LinkedIn URL Validation:

  • Checks property history for data source
  • Validates current LinkedIn URL
  • Searches Google for alternatives
  • Detects redirects
  • Applies source-specific thresholds

Employment Change Detection:

  • Fetches LinkedIn profile
  • Compares with CRM data
  • Detects job changes
  • Creates new records if needed
  • Updates associations

Can Do:

  • Use tools (web search, websites, HubSpot API)
  • Multi-step reasoning
  • Iterative research
  • Complex decision-making
  • Generate structured outputs

Cannot Do:

  • Generate marketing content
  • Create long-form text
  • Run for extended periods
  • Access non-approved tools
  • aget_google_search_results — Web search
  • aget_website_content — Fetch and analyze websites
  • aget_hubspot_object — HubSpot data with history
  • aget_linkedin_profile — LinkedIn personal profiles
  • aget_linkedin_organization — LinkedIn company pages
kind: Agent
spec:
agent:
agentDef:
identifier: company-research-agent
llmModel: gemini/gemini-2.0-flash
tools: [aget_google_search_results, aget_website_content]
objective: |
Find and verify key information about companies.
Use web search and website visits to gather data.
inputPrompt: |
Company: {{ company__name }}
Domain: {{ company__domain }}
  • Duplicate detection
  • Data verification
  • Research tasks
  • Complex analysis

Multi-step outreach content generation.

Processing: AI generates personalized emails/messages
Speed: Moderate
Cost: Varies by steps and data sources
Output: Multi-step campaign content

  • ✅ Generating outbound content
  • ✅ Multiple touchpoint campaigns
  • ✅ Personalization from multiple data sources
  • ✅ Email/LinkedIn/call scripts needed

Conference Outreach:

  • Generates Day 1 email
  • Creates Day 3 follow-up
  • Personalizes based on LinkedIn, news, etc.
  • Maps to HubSpot tasks

Can Do:

  • Generate email content
  • Create LinkedIn messages
  • Write call scripts
  • Personalize based on data
  • Multi-step cadences

Cannot Do:

  • Send emails directly
  • Make phone calls
  • Post to LinkedIn automatically
  • Perform research
kind: Sequence
spec:
prompt:
parts:
- content_type: TEXT
title: Instructions
text: |
Create a 2-email sequence...
steps:
- action_type: email
day: 1
- action_type: email
day: 4
  • Sales sequences
  • Marketing campaigns
  • Event follow-ups
  • Personalized outreach
FeatureCodeClassifierStructuredDataAgentSequence
AI UsageNoneLLMLLMLLM + ToolsLLM
SpeedFastestFastFastSlowModerate
CostLowestLow-ModerateLow-ModerateHighModerate-High
ResearchNoNoNoYesNo
ToolsFunctionsNoneNoneMultipleNone
Output TypeStructured dataCategoryStructured dataStructured dataContent
Use CaseDeterministic logicCategorizationData extractionComplex analysisOutreach

1. Do you need AI?

  • No → Code
  • Yes → Continue

2. Is output a category?

  • Yes → Classifier
  • No → Continue

3. Need structured data extraction?

  • Yes → StructuredData
  • No → Continue

4. Need web research?

  • Yes → Agent
  • No → Continue

5. Generating outreach content?

  • Yes → Sequence
  • No → Code or StructuredData

High Volume (1000s of records):

  • Best: Code
  • Good: Classifier, StructuredData
  • Avoid: Agent (too slow/expensive)

Low Volume (10s of records):

  • Any kind works
  • Choose based on functionality needed

Cost-Sensitive:

  • Best: Code (often free)
  • Good: Classifier, StructuredData (0.1-0.5/record)
  • Expensive: Agent (1-5/record)

Research → Extraction → Classification → Action

1. Agent: Research company
2. StructuredData: Extract key information
3. Classifier: Assess fit
4. Code: Update CRM

Validate → Clean → Extract → Enrich

1. Code: Email validation
2. Classifier: Contact quality assessment
3. StructuredData: Extract missing fields
4. Agent: Contact cleaning and normalization
5. Code: Email discovery