Skip to main content

Vision Mode

Process images, charts, diagrams, and visual content within documents using AI vision capabilities.

Overview

Vision Mode enables WYN360 CLI to understand and analyze visual content in documents, providing insights from charts, diagrams, screenshots, and other images.

Supported Formats

  • Word Documents (.docx) - Images, charts, diagrams
  • PDF Files (.pdf) - Scanned documents, technical diagrams
  • Excel Files (.xlsx) - Embedded charts and visualizations
  • Direct Images (.png, .jpg, .gif) - Screenshots, diagrams

Processing Modes

Skip Mode (Default)

You: Read report.docx

WYN360: [Processes text only, ignores images]
# Cost: $0.00 for images

Describe Mode

You: Read report.docx with describe mode

WYN360: [Extracts alt text and captions only]
📊 [Image 1]: Revenue chart showing quarterly data
📐 [Image 2]: System architecture diagram
# Cost: $0.00 for images (no API calls)

Vision Mode

You: Read report.docx with vision mode

WYN360: [Full AI analysis of images]
📊 **[Image 1]:** Bar chart showing quarterly revenue growth from Q1 to Q4.
Q4 shows the highest revenue at approximately $2.5M, representing a 23%
increase from Q3. All quarters show positive growth year-over-year.

📐 **[Image 2]:** System architecture diagram depicting three layers:
frontend (React), API layer (FastAPI), and database (PostgreSQL).
Shows data flow from user requests through authentication middleware.

💰 **Vision API Cost:** $0.06 (2 images processed)

Use Cases

Technical Documentation

  • Architecture Diagrams - Understand system designs
  • Flowcharts - Process workflow analysis
  • UML Diagrams - Class and sequence diagram interpretation

Data Analysis

  • Charts & Graphs - Extract insights from visualizations
  • Dashboards - Understand KPIs and metrics
  • Infographics - Convert visual data to text insights

UI/UX Design

  • Mockups - Analyze interface designs
  • Wireframes - Understand user flow
  • Screenshots - Capture current state for analysis

Cost Management

Vision API Pricing

  • Cost per Image: ~$0.01-0.05 depending on complexity
  • Separate Tracking - Vision costs shown separately from text processing
  • Token Efficiency - Smart chunking reduces processing costs

Usage Examples

Document with 5 images:
- Text processing: 2,000 tokens = $0.006
- Vision processing: 5 images = $0.15
- Total cost: ~$0.156

Configuration

# Vision processing settings
vision_mode: "skip" # skip, describe, vision
vision_batch_size: 5 # Process images in batches
vision_quality: "standard" # standard, high

Examples

See Usage Examples for detailed workflows with visual content processing.