Pipeline Kinds

Learn about Sellestial’s five pipeline kinds and choose the right one for your use case.

Overview

Sellestial offers five pipeline kinds, each optimized for different tasks:

Code — Deterministic logic without AI
Classifier — AI categorization with confidence scores
StructuredData — LLM-powered structured data extraction with predefined schema
Agent — Autonomous AI with tools and research
Sequence — Multi-step outreach content generation

Quick Decision Tree

Choosing the right pipeline kind

🎯 Need outreach content? → Use Sequence

🔍 Need autonomous research or web verification? → Use Agent

📊 Need to extract structured data with a predefined schema? → Use StructuredData

🏷️ Need to assign discrete categories? → Use Classifier

⚡ Need deterministic logic or calculations? → Use Code

Code

Pure Python logic without AI generation.

Characteristics

Processing: Deterministic code execution
Speed: Fastest
Cost: Lowest
Reliability: Highest (no AI variability)

When to use Code

✅ Logic is deterministic and rule-based
✅ No AI judgment needed
✅ Speed and cost are priorities
✅ Exact, repeatable results required

Examples

Email Validation:

Extract domain from email
DNS MX lookup
Return status based on records
No AI needed

Data Transformation:

Parse and reformat fields
Extract data from strings
Calculate derived values
API calls with predictable responses

Validation:

Check field formats
Verify required fields present
Validate against rules
Flag anomalies

Capabilities

Can Do:

Execute Python code
Call functions and APIs
Access data sources
Transform data
Make deterministic decisions

Cannot Do:

Generate natural language
Make subjective judgments
Learn or adapt
Handle ambiguity

Configuration

kind: Code
spec:
  code:
    dataFields: [email, hs_object_id]
    onGenerate:
      code: |
        # Your Python code here
        domain = extract_domain(email)
        mx_records = functions.lookup_dns_record(domain, "MX")
        return {"status": "DOMAIN_OK" if mx_records else "NO_MX"}

Best For

High-volume processing
Cost-sensitive operations
Real-time requirements
Deterministic workflows

Classifier

AI assigns discrete categories with justification.

Characteristics

Processing: LLM categorization
Speed: Fast
Cost: Low to moderate (typically 0.1-0.5 credits per record)
Output: Category + justification (≤25 words)

When to use Classifier

✅ Need to assign categories
✅ Want confidence scores
✅ Categories are predefined
✅ Need justification for decisions

Examples

Contact Quality Classification:

Analyzes contact data
Classifies as: LIKELY_REAL_PERSON, INVALID, MADE_UP, etc.
Provides justification
Handles edge cases with AI judgment

Lead Quality Scoring:

Reviews firmographic data
Assigns: HIGH_QUALITY, MEDIUM_QUALITY, LOW_QUALITY
Scores confidence
Justifies classification

Data Quality Assessment:

Evaluates field completeness
Categorizes: COMPLETE, PARTIAL, POOR, MISSING
Confidence level
Lists issues found

Capabilities

Can Do:

Classify into predefined categories
Provide confidence scores
Generate justifications
Handle ambiguous data
Apply nuanced judgment

Cannot Do:

Create new categories dynamically
Perform web research
Generate long-form content
Use external tools

Configuration

kind: Classifier
spec:
  classifier:
    llmModel: openai/o4-mini
    categories:
      - name: HIGH_QUALITY
        description: "Complete profile with business email"
      - name: LOW_QUALITY
        description: "Missing critical information"
    promptTemplate: |
      Contact: {{ firstname }} {{ lastname }}
      Email: {{ email }}
      Company: {{ company__name }}
    systemPromptTemplate: |
      Classify the lead quality. Return JSON:
      {"classification": "<CATEGORY>", "justification": "<reason>"}

Best For

Data quality assessment
Lead scoring
Segmentation
Content categorization

StructuredData

LLM extracts structured information with predefined schema.

Characteristics

Processing: LLM extraction with schema validation
Speed: Fast
Cost: Low to moderate (typically 0.1-0.5 credits per record)
Output: Structured JSON matching predefined schema

When to use StructuredData

✅ Need to extract specific fields from unstructured data
✅ Output schema is predefined and consistent
✅ Working with text, websites, or documents
✅ Need structured output for CRM field population

Examples

Company Information Extraction:

Extract company name, industry, size, location from website
Populate HubSpot company properties
Validates against predefined schema
Returns NOT_FOUND or INSUFFICIENT_INFORMATION sentinels when data unavailable

Contact Data Parsing:

Extract name, title, email, phone from text
Structure messy contact information
Normalize to consistent format
Populate contact fields

Job Posting Analysis:

Extract job title, requirements, salary range
Structure job description data
Categorize by seniority and function
Populate custom fields

Capabilities

Can Do:

Extract specific fields from unstructured text
Validate against predefined schema
Handle missing data with sentinel values
Structure data for CRM field population
Process websites, documents, text

Cannot Do:

Make subjective judgments (use Classifier)
Perform web research (use Agent)
Generate long-form content (use Sequence)
Create dynamic schemas

Configuration

kind: StructuredData
spec:
  structuredData:
    llmModel: openai/gpt-4.1
    promptTemplate: |
      Extract company information from the website content below.
      Website: {{ website_summary }}
    systemPromptTemplate: |
      Extract structured company data. Return JSON with exact schema.
      Use "NOT_FOUND" when data is missing.
    outputFormat:
      name: CompanyInfo
      description: Structured company information
      fields:
        company_name:
          type: string
          description: Official company name
          nullable: true
        industry:
          type: string
          description: Primary industry
          nullable: true
        employee_count:
          type: integer
          description: Number of employees
          nullable: true
        headquarters_location:
          type: string
          description: HQ city and country
          nullable: true

Output Contract

Returns JSON matching the defined schema:

{
  "company_name": "Acme Corporation",
  "industry": "Software & Technology",
  "employee_count": 500,
  "headquarters_location": "San Francisco, USA"
}

Or with missing data:

{
  "company_name": "Acme Corporation",
  "industry": "NOT_FOUND",
  "employee_count": null,
  "headquarters_location": "INSUFFICIENT_INFORMATION"
}

Available LLM Models

openai/gpt-5
openai/gpt-oss-120b
openai/o1
openai/o4-mini
openai/o3
openai/gpt-4.1
openai/gpt-4.1-mini
openai/gpt-4.1-nano
gemini/gemini-2.5-pro
gemini/gemini-2.0-flash
gemini/gemini-2.5-flash

Best For

Data extraction from unstructured sources
CRM field population
Website scraping with structure
Document parsing
Contact data normalization

Agent

Autonomous AI with tools and research capabilities.

Characteristics

Processing: Multi-step AI with tool use
Speed: Slower (research-intensive)
Cost: Higher (1-5 credits typical)
Capabilities: Web search, website visits, HubSpot analysis

When to use Agent

✅ Need web research
✅ Complex multi-step reasoning required
✅ Decision requires external verification
✅ High-value, low-volume operations

Examples

Duplicate Analysis:

Fetches HubSpot property history
Visits both company websites
Searches for additional information
Makes merge decision with confidence
Preserves best data from both records

LinkedIn URL Validation:

Checks property history for data source
Validates current LinkedIn URL
Searches Google for alternatives
Detects redirects
Applies source-specific thresholds

Employment Change Detection:

Fetches LinkedIn profile
Compares with CRM data
Detects job changes
Creates new records if needed
Updates associations

Capabilities

Can Do:

Use tools (web search, websites, HubSpot API)
Multi-step reasoning
Iterative research
Complex decision-making
Generate structured outputs

Cannot Do:

Generate marketing content
Create long-form text
Run for extended periods
Access non-approved tools

Available Tools

aget_google_search_results — Web search
aget_website_content — Fetch and analyze websites
aget_hubspot_object — HubSpot data with history
aget_linkedin_profile — LinkedIn personal profiles
aget_linkedin_organization — LinkedIn company pages

Configuration

kind: Agent
spec:
  agent:
    agentDef:
      identifier: company-research-agent
      llmModel: gemini/gemini-2.0-flash
      tools: [aget_google_search_results, aget_website_content]
      objective: |
        Find and verify key information about companies.
        Use web search and website visits to gather data.
    inputPrompt: |
      Company: {{ company__name }}
      Domain: {{ company__domain }}

Best For

Duplicate detection
Data verification
Research tasks
Complex analysis

Sequence

Multi-step outreach content generation.

Characteristics

Processing: AI generates personalized emails/messages
Speed: Moderate
Cost: Varies by steps and data sources
Output: Multi-step campaign content

When to Use

✅ Generating outbound content
✅ Multiple touchpoint campaigns
✅ Personalization from multiple data sources
✅ Email/LinkedIn/call scripts needed

Examples

Conference Outreach:

Generates Day 1 email
Creates Day 3 follow-up
Personalizes based on LinkedIn, news, etc.
Maps to HubSpot tasks

Capabilities

Can Do:

Generate email content
Create LinkedIn messages
Write call scripts
Personalize based on data
Multi-step cadences

Cannot Do:

Send emails directly
Make phone calls
Post to LinkedIn automatically
Perform research

Configuration

kind: Sequence
spec:
  prompt:
    parts:
      - content_type: TEXT
        title: Instructions
        text: |
          Create a 2-email sequence...
  steps:
    - action_type: email
      day: 1
    - action_type: email
      day: 4

Best For

Sales sequences
Marketing campaigns
Event follow-ups
Personalized outreach

Comparison Matrix

Feature	Code	Classifier	StructuredData	Agent	Sequence
AI Usage	None	LLM	LLM	LLM + Tools	LLM
Speed	Fastest	Fast	Fast	Slow	Moderate
Cost	Lowest	Low-Moderate	Low-Moderate	High	Moderate-High
Research	No	No	No	Yes	No
Tools	Functions	None	None	Multiple	None
Output Type	Structured data	Category	Structured data	Structured data	Content
Use Case	Deterministic logic	Categorization	Data extraction	Complex analysis	Outreach

Choosing the Right Type

Start with These Questions

1. Do you need AI?

No → Code
Yes → Continue

2. Is output a category?

Yes → Classifier
No → Continue

3. Need structured data extraction?

Yes → StructuredData
No → Continue

4. Need web research?

Yes → Agent
No → Continue

5. Generating outreach content?

Yes → Sequence
No → Code or StructuredData

Performance Considerations

High Volume (1000s of records):

Best: Code
Good: Classifier, StructuredData
Avoid: Agent (too slow/expensive)

Low Volume (10s of records):

Any kind works
Choose based on functionality needed

Cost-Sensitive:

Best: Code (often free)
Good: Classifier, StructuredData (0.1-0.5/record)
Expensive: Agent (1-5/record)

Common Patterns

Research → Extraction → Classification → Action

1. Agent: Research company
2. StructuredData: Extract key information
3. Classifier: Assess fit
4. Code: Update CRM

Validate → Clean → Extract → Enrich

1. Code: Email validation
2. Classifier: Contact quality assessment
3. StructuredData: Extract missing fields
4. Agent: Contact cleaning and normalization
5. Code: Email discovery

Next Steps

Data Sources — Available data providers
LLM Models — Choose the right AI model
Best Practices — Optimization tips