What LLMs Can Do

Exploring the capabilities of Large Language Models across different domains

Chat

mature
90
/100

Advanced conversational abilities with strong reasoning and multilingual support (Based on Google Sheets MMLU scores)

Subsections

General Conversation
mature
90
/100

Natural dialogue with high context awareness and multilingual capabilities

Feature
Claude 3 Opus iconClaude 3 Opus
GPT-4 Turbo iconGPT-4 Turbo
Gemini 1.5 Pro iconGemini 1.5 Pro
Context Awareness
Maintains context across conversation turns
Full
200k token context
Full
128k token context
Full
1M token context
Multilingual
Support for multiple languages
Full
100+ languages
Full
All major languages
Full
All major languages
Reasoning And Problem Solving
mature
85
/100

Complex reasoning, mathematical problem-solving, and logical deduction

Feature
Claude 3 Opus iconClaude 3 Opus
GPT-4 Turbo iconGPT-4 Turbo
Gemini 1.5 Pro iconGemini 1.5 Pro
Mathematical Reasoning
Solving complex mathematical problems
Full
Highest MMLU math
Full
90th percentile UBE
Partial
MMLU-Pro 76.4%
Logical Deduction
Step-by-step logical reasoning
Full
93.7% HumanEval score
Full
Advanced reasoning
Full
Strong reasoning

Top Players

Claude 3 Opus icon
Claude 3 Opus
Anthropic
  • 93.7% HumanEval score
  • 200k token context
  • Best-in-class reasoning
  • Highest MMLU scores
GPT-4 Turbo icon
GPT-4 Turbo
OpenAI
  • 128k context window
  • 90th percentile UBE
  • Advanced reasoning
  • Comprehensive API
Gemini 1.5 Pro icon
Gemini 1.5 Pro
Google
  • 1M token context
  • Strong performance
  • Video understanding
  • Efficient processing

Coding

mature
80
/100

Advanced code generation and analysis with specialized tools and models (Based on SWE-bench data)

Subsections

Code Completion
mature
80
/100

Real-time code suggestions with multi-language support and context awareness

Feature
GitHub Copilot iconGitHub Copilot
Cursor iconCursor
Windsurf iconWindsurf
All Hands iconAll Hands
Real Time Suggestions
Intelligent code suggestions in real-time
Full
Real-time completions
Full
Context-aware suggestions
Full
Smart completions
Partial
Basic suggestions
Multi Language
Support for multiple programming languages
Full
40+ languages
Full
All major languages
Partial
Major languages
Partial
Major languages
Context Aware
Understanding project context and dependencies
Full
Full repository context
Full
Codebase understanding
Full
Full repo context
Full
Team context aware
Code Modification
emerging
75
/100

Code refactoring and optimization with AI-powered improvements

Feature
GitHub Copilot iconGitHub Copilot
Cursor iconCursor
Windsurf iconWindsurf
All Hands iconAll Hands
Refactoring
Automated code refactoring and improvements
Partial
Basic refactoring
Full
AI-powered edits
Full
Smart transformations
Partial
Collaborative improvements
Bug Fixing
Automated bug detection and fixing
Full
Advanced bug detection
Full
Smart bug fixing
Full
Automated fixes
None
Pr Review
emerging
70
/100

Automated code review and analysis of pull requests

Feature
GitHub Copilot iconGitHub Copilot
Cursor iconCursor
Windsurf iconWindsurf
All Hands iconAll Hands
Automated Review
AI-powered code review comments
Partial
Through Copilot Chat
Partial
Basic review
Full
Comprehensive review
Full
Team-focused review
Security Analysis
Identifying security issues and vulnerabilities
Full
Security features
None
Partial
Basic security checks
Partial
Basic security review
Agentic Programming
early
65
/100

Autonomous development and task planning capabilities

Feature
GitHub Copilot iconGitHub Copilot
Cursor iconCursor
All Hands iconAll Hands
Autonomous Development
Independent code development capabilities
Partial
Basic automation
Full
Advanced autonomous coding
Full
Full autonomous development
Task Planning
Planning and organizing coding tasks
Partial
Basic planning
Full
Advanced task planning
Full
Team-aware planning

Top Players

GitHub Copilot icon
GitHub Copilot
GitHub/OpenAI
  • Real-time code suggestions
  • IDE integration
  • Context-aware completions
Cursor icon
Cursor
Cursor
  • Context-aware completions
  • AI-powered edits and refactoring
  • Basic code review capabilities
Windsurf icon
Windsurf
Windsurf AI
  • Context-aware suggestions
  • Smart code transformations
  • Automated code review
All Hands icon
All Hands
All Hands
  • Team-aware code suggestions
  • Collaborative code improvements
  • Team-focused code review

Multimodal Understanding

emerging
75
/100

Processing and understanding multiple types of input (text, images, audio)

Subsections

Vision Analysis
mature
80
/100

Understanding and analyzing visual content

Feature
Gemini Ultra iconGemini Ultra
Gemini 1.5 Pro iconGemini 1.5 Pro
GPT-4V iconGPT-4V
Claude 3 Opus iconClaude 3 Opus
Image Understanding
Detailed analysis of image content and context
Full
Advanced multimodal
Full
Advanced visual processing
Full
High-res image analysis
Full
Detailed visual comprehension
Ocr Capabilities
Text extraction from images
None
Partial
Basic text extraction
Full
Advanced OCR
Full
Mathematical notation
Audio Processing
emerging
70
/100

Processing and understanding audio input

Feature
Gemini Ultra iconGemini Ultra
Gemini 1.5 Pro iconGemini 1.5 Pro
Speech Recognition
Converting speech to text with high accuracy
Full
Advanced speech recognition
Full
Multiple languages
Cross Modal Reasoning
emerging
75
/100

Understanding relationships between different types of input

Feature
Gemini 1.5 Pro iconGemini 1.5 Pro
GPT-4V iconGPT-4V
Claude 3 Opus iconClaude 3 Opus
Visual Reasoning
Complex reasoning about visual information
Full
Cross-modal understanding
Full
Visual reasoning
Full
Diagram analysis
Multimodal Generation
Generating responses combining multiple modalities
Full
Text and image generation
Partial
Text-based responses only
Partial
Text-based responses only

Top Players

Gemini Ultra icon
Gemini Ultra
Google
  • 91.8 MMLU score
  • Advanced multimodal
  • Cross-modal reasoning
Gemini 1.5 Pro icon
Gemini 1.5 Pro
Google
  • 1M token context
  • Video understanding
  • Advanced visual processing
GPT-4V icon
GPT-4V
OpenAI
  • High-resolution image analysis
  • Visual reasoning with GPT-4 Turbo
  • Advanced OCR capabilities
Claude 3 Opus icon
Claude 3 Opus
Anthropic
  • 200k token context
  • 93.7% HumanEval score
  • Advanced visual analysis
  • Mathematical diagram comprehension
Create Issue