Skip to main content

Intelligent Browser Automation Architecture

Current Implementation (v0.3.78)

WYN360-CLI features a sophisticated browser automation system that uses a DOM-first approach with intelligent fallback strategies. The system combines fast DOM analysis, AI-powered Stagehand automation, and vision-based fallbacks to provide reliable web interaction capabilities while maintaining cost efficiency and performance.

Overview

The intelligent browser automation enables Claude to:

  • Analyze web page structure using fast DOM extraction
  • Execute precise automation using DOM selectors when possible
  • Fallback to Stagehand AI automation for complex scenarios
  • Adapt using vision-based approaches for dynamic content
  • Optimize cost and performance through intelligent approach selection

This creates a layered automation system that achieves high success rates while minimizing costs through intelligent approach selection.

Architecture

Cost & Performance Optimization

Performance Comparison

ApproachCost per TaskSpeedReliabilityUse Cases
DOM Automation~$0.001-0.01⚡ 1-3s95% on static sitesForms, clicks, simple navigation
Stagehand AI~$0.05-0.15🚀 5-15s85% on dynamic sitesComplex workflows, modern SPAs
Vision Fallback~$0.20-0.50🐌 20-60s75% on any siteVisual elements, CAPTCHAs

Intelligent Approach Selection

The system automatically selects the optimal approach based on:

  • DOM Confidence Score - Calculated from element accessibility
  • Page Complexity - Number of interactive elements and dynamic content
  • Task Requirements - Specific actions needed (forms vs navigation)
  • Historical Success - Learning from previous attempts

Components

1. DOM Analyzer

Fast webpage structure analysis for automation planning.

Features:

  • Interactive element detection and accessibility scoring
  • Form analysis with field identification
  • Navigation structure mapping
  • Confidence score calculation for automation success

File: wyn360_cli/tools/browser/dom_analyzer.py

2. Browser Manager

Centralized browser lifecycle and session management.

Features:

  • Playwright browser instance management
  • Session persistence and cookie handling
  • Resource cleanup and memory management
  • Concurrent session support

File: wyn360_cli/tools/browser/browser_manager.py

3. Unified Automation Interface

High-level orchestration layer for approach selection and execution.

Features:

  • Automatic approach selection based on confidence scores
  • Intelligent fallback progression (DOM → Stagehand → Vision)
  • Performance monitoring and cost optimization
  • Error handling and retry logic

File: wyn360_cli/tools/browser/unified_automation_interface.py

4. Stagehand Integration

AI-powered automation for complex web interactions.

Features:

  • Natural language to web action conversion
  • Dynamic element handling and adaptation
  • Modern SPA and JavaScript-heavy site support
  • Context-aware automation planning

File: wyn360_cli/tools/browser/stagehand_integration.py

5. Vision Fallback Integration

Visual analysis and interaction for complex scenarios.

Features:

  • Screenshot-based element detection
  • Visual similarity matching
  • CAPTCHA and dynamic content handling
  • Accessibility-independent interaction

File: wyn360_cli/tools/browser/vision_fallback_integration.py

6. Enhanced Automation Orchestrator

Coordination layer managing multiple automation approaches.

Features:

  • Multi-approach coordination and selection
  • Performance monitoring and analytics
  • Error classification and recovery strategies
  • Learning from success/failure patterns

File: wyn360_cli/tools/browser/enhanced_automation_orchestrator.py

Browser Control Features

Show Browser Flag

Control browser visibility for debugging and development:

# Show browser window during automation
wyn360 --show-browser

# Environment variable option
export WYN360_BROWSER_SHOW=1
wyn360

Headless Mode (Default)

  • Default: Browser runs invisibly for performance
  • Debugging: Use --show-browser to watch automation
  • Development: Visual feedback for troubleshooting

Available Functions

analyze_page_dom()

Analyze webpage DOM structure for intelligent automation planning.

async def analyze_page_dom(
ctx: RunContext[None],
url: str,
task_description: Optional[str] = None,
confidence_threshold: float = 0.7,
show_browser: Optional[bool] = None
) -> str:
"""
Analyze webpage DOM structure for automation planning.

Args:
url: Website URL to analyze
task_description: Optional description of intended task
confidence_threshold: Minimum confidence for DOM approach (0.0-1.0)
show_browser: Whether to show browser window for debugging

Returns:
Structured analysis with confidence scores and recommendations
"""

intelligent_browse()

Main intelligent browsing function with DOM-first approach and fallbacks.

async def intelligent_browse(
ctx: RunContext[None],
task: str,
url: str,
max_steps: int = 15,
confidence_threshold: float = 0.7,
show_browser: Optional[bool] = None
) -> str:
"""
Intelligently browse website using DOM-first approach with fallbacks.

Args:
task: Natural language description of what to accomplish
url: Starting URL for the browsing session
max_steps: Maximum number of automation steps
confidence_threshold: Minimum confidence for DOM approach
show_browser: Override default browser visibility

Returns:
Formatted result with actions taken and data extracted
"""

browse_and_find()

Simplified browsing function for basic navigation and data extraction.

async def browse_and_find(
ctx: RunContext[None],
url: str,
task: str,
max_attempts: int = 3,
show_browser: Optional[bool] = None
) -> str:
"""
Browse website and find specific information or complete tasks.

Args:
url: Target website URL
task: Description of what to find or accomplish
max_attempts: Number of retry attempts if initial approach fails
show_browser: Whether to show browser window

Returns:
Results of the browsing task with approach used
"""

login_to_website()

Automated website authentication with form detection.

async def login_to_website(
ctx: RunContext[None],
url: str,
username: str,
password: str,
save_credentials: bool = False,
show_browser: Optional[bool] = None
) -> str:
"""
Automatically login to websites with form detection.

Args:
url: Login page URL
username: Username or email for login
password: Password for login
save_credentials: Whether to save encrypted credentials
show_browser: Whether to show browser window

Returns:
Login result and session status
"""

fetch_website()

Simple page content fetching and analysis.

async def fetch_website(
ctx: RunContext[None],
url: str,
max_tokens: int = 50000,
show_browser: Optional[bool] = None
) -> str:
"""
Fetch and analyze website content.

Args:
url: Website URL to fetch
max_tokens: Maximum content tokens to return
show_browser: Whether to show browser window

Returns:
Formatted website content and metadata
"""

Usage Examples

Example 1: E-commerce Product Search (DOM-First)

Task: Search for products with specific criteria

You: "Browse to Amazon and find wireless headphones under $50 with good reviews"

System:
🔍 **DOM Analysis:** High confidence (0.9) - Well-structured search forms
⚡ **DOM Automation Selected** - Fast and cost-effective
✅ **Success** - Found 12 products matching criteria
💰 **Cost:** $0.02 | ⏱️ **Time:** 4 seconds

Example 2: Complex Web App (Stagehand AI)

Task: Navigate modern SPA with dynamic content

You: "Login to my GitHub account and create a new repository named 'test-project'"

System:
🔍 **DOM Analysis:** Medium confidence (0.6) - Dynamic login flow
🤖 **Stagehand AI Selected** - Better for complex workflows
✅ **Success** - Repository created successfully
💰 **Cost:** $0.12 | ⏱️ **Time:** 18 seconds

Example 3: Fallback Progression

Task: Handle site with unusual interface

You: "Complete the contact form on this custom website"

System:
🔍 **DOM Analysis:** Low confidence (0.3) - Unusual form structure
👁️ **Vision Fallback Selected** - Visual element detection needed
✅ **Success** - Form submitted using visual navigation
💰 **Cost:** $0.35 | ⏱️ **Time:** 45 seconds

Example 4: Intelligent Approach Learning

Task: Repeated automation on familiar site

You: "Check my Gmail inbox for new messages"

System:
🧠 **Historical Analysis:** Previous DOM success on Gmail (0.95 confidence)
⚡ **DOM Automation Selected** - Learned pattern applied
✅ **Success** - 5 new messages found
💰 **Cost:** $0.01 | ⏱️ **Time:** 2 seconds

Configuration

Browser Control

# Environment variables for browser behavior
export WYN360_BROWSER_SHOW=1 # Show browser window
export WYN360_DOM_CONFIDENCE_THRESHOLD=0.7 # DOM approach threshold
export WYN360_MAX_AUTOMATION_STEPS=15 # Maximum steps per session
export WYN360_AUTOMATION_TIMEOUT=300 # Total timeout in seconds

Approach Selection

# Fine-tune approach selection in agent configuration
automation_config = {
"dom_confidence_threshold": 0.7,
"enable_stagehand_fallback": True,
"enable_vision_fallback": True,
"max_retries_per_approach": 2,
"total_timeout_seconds": 300
}

Cost Controls

# Budget-aware automation settings
cost_limits = {
"max_cost_per_task": 0.50, # Stop if exceeding 50¢
"prefer_dom_approach": True, # Favor fast DOM when possible
"track_spending": True, # Real-time cost tracking
"enable_approach_learning": True # Learn from success patterns
}

Advanced Features

Token Counting & Cost Tracking

Enhanced /tokens command shows detailed breakdown:

{
"total_tokens": 6420,
"costs": {
"total": "$0.18",
"dom_automation": "$0.03",
"stagehand_ai": "$0.12",
"vision_fallback": "$0.03"
},
"browser_automation": {
"total_operations": 12,
"dom_approach_count": 8,
"stagehand_count": 3,
"vision_count": 1,
"average_success_rate": 0.92
},
"approach_distribution": {
"dom_automation": 65,
"stagehand_ai": 25,
"vision_fallback": 10
}
}

Confidence Score Analytics

# Approach effectiveness tracking
confidence_analytics = {
"dom_average_confidence": 0.82,
"dom_success_rate": 0.94,
"stagehand_success_rate": 0.87,
"vision_success_rate": 0.78,
"overall_success_rate": 0.91
}

Performance Monitoring

# Real-time performance metrics
performance_stats = {
"average_dom_speed": "2.3s",
"average_stagehand_speed": "12.1s",
"average_vision_speed": "34.7s",
"cost_per_successful_task": "$0.08",
"approach_selection_accuracy": 0.89
}

Troubleshooting

DOM Analysis Issues

Symptom: Low confidence scores on seemingly simple sites

Solutions:

  1. Check page complexity: Sites with heavy JavaScript may need Stagehand
  2. Verify element accessibility: Ensure proper form labels and IDs
  3. Enable browser visibility: wyn360 --show-browser to debug
  4. Lower confidence threshold: Allow DOM approach on borderline cases

Stagehand Integration Problems

Symptom: Stagehand automation fails or times out

Solutions:

  1. Check API connectivity: Verify Stagehand service availability
  2. Simplify task description: Break complex tasks into smaller steps
  3. Enable browser visibility: Watch automation for debugging
  4. Try vision fallback: Force fallback for problematic sites

Vision Fallback Issues

Symptom: Vision approach fails to find elements

Solutions:

  1. Check screenshot quality: Ensure browser window is properly sized
  2. Verify element visibility: Target elements must be in viewport
  3. Simplify visual targets: Focus on clearly visible elements
  4. Enable browser visibility: See what the vision system sees

Performance Concerns

Symptom: Automation is slower than expected

Solutions:

  1. Check approach selection: Ensure DOM approach is being used when possible
  2. Monitor confidence scores: Investigate why DOM confidence is low
  3. Optimize task descriptions: Be specific about automation goals
  4. Enable learning: Let system learn patterns for repeated tasks

Best Practices

1. Optimize for DOM-First Success

  • Use websites with proper semantic HTML
  • Target sites with good accessibility practices
  • Provide clear, specific task descriptions
  • Enable approach learning for repeated tasks

2. Leverage Intelligent Fallbacks

  • Let the system automatically select the best approach
  • Don't force specific automation methods unless needed
  • Monitor success rates to understand site complexity
  • Use browser visibility for debugging complex sites

3. Use Browser Visibility Strategically

# Development and debugging
wyn360 --show-browser

# Production and scripts
wyn360 # headless mode for performance

4. Monitor Performance & Costs

  • Check /tokens regularly for approach distribution
  • Monitor confidence scores to understand automation challenges
  • Enable learning to improve future performance
  • Set appropriate confidence thresholds for your use cases

Real-World Performance

Approach Distribution (Typical Usage)

  • DOM Automation: 65-75% of tasks
  • Stagehand AI: 20-30% of tasks
  • Vision Fallback: 5-15% of tasks

Success Rates by Site Type

  • Static Sites (news, blogs): 95% DOM success
  • E-commerce (Amazon, eBay): 85% DOM success
  • Modern SPAs (GitHub, Gmail): 60% DOM, 30% Stagehand
  • Custom/Complex Sites: 40% DOM, 35% Stagehand, 25% Vision

Cost Distribution

  • Average cost per task: $0.08
  • DOM-heavy workloads: $0.02-0.05 per task
  • Stagehand-heavy workloads: $0.10-0.20 per task
  • Vision-heavy workloads: $0.25-0.50 per task

Future Enhancements

Planned Improvements

  • Multi-tab workflow coordination - Handle complex cross-page workflows
  • Advanced pattern learning - Improve automation from usage patterns
  • Real-time confidence tuning - Dynamic threshold adjustment
  • Site-specific optimization - Custom approaches for popular sites
  • Performance analytics dashboard - Real-time monitoring and insights

Support

  • Documentation: Complete guides in /docs
  • Examples: Real-world use cases in USE_CASES.md
  • Issues: Report bugs on GitHub
  • Community: Discussions for questions and feedback

Generated with WYN360-CLI v0.3.78 Intelligent Browser Automation with DOM-First Architecture