What LLMs Can Do

Exploring the capabilities of Large Language Models across different domains

Chat

mature

/100

Advanced conversational abilities with strong reasoning and multilingual support (Based on Google Sheets MMLU scores)

Subsections

General Conversation

mature

/100

Natural dialogue with high context awareness and multilingual capabilities

Feature	Claude 3 Opus	GPT-4 Turbo	Gemini 1.5 Pro
Context Awareness Maintains context across conversation turns	Full 200k token context	Full 128k token context	Full 1M token context
Multilingual Support for multiple languages	Full 100+ languages	Full All major languages	Full All major languages

Reasoning And Problem Solving

mature

/100

Complex reasoning, mathematical problem-solving, and logical deduction

Feature	Claude 3 Opus	GPT-4 Turbo	Gemini 1.5 Pro
Mathematical Reasoning Solving complex mathematical problems	Full Highest MMLU math	Full 90th percentile UBE	Partial MMLU-Pro 76.4%
Logical Deduction Step-by-step logical reasoning	Full 93.7% HumanEval score	Full Advanced reasoning	Full Strong reasoning

Top Players

Claude 3 Opus

Anthropic

93.7% HumanEval score
200k token context
Best-in-class reasoning
Highest MMLU scores

GPT-4 Turbo

OpenAI

128k context window
90th percentile UBE
Advanced reasoning
Comprehensive API

Gemini 1.5 Pro

Google

1M token context
Strong performance
Video understanding
Efficient processing

Coding

mature

/100

Advanced code generation and analysis with specialized tools and models (Based on SWE-bench data)

Subsections

Code Completion

mature

/100

Real-time code suggestions with multi-language support and context awareness

Feature	GitHub Copilot	Cursor	Windsurf	All Hands
Real Time Suggestions Intelligent code suggestions in real-time	Full Real-time completions	Full Context-aware suggestions	Full Smart completions	Partial Basic suggestions
Multi Language Support for multiple programming languages	Full 40+ languages	Full All major languages	Partial Major languages	Partial Major languages
Context Aware Understanding project context and dependencies	Full Full repository context	Full Codebase understanding	Full Full repo context	Full Team context aware

Code Modification

emerging

/100

Code refactoring and optimization with AI-powered improvements

Feature	GitHub Copilot	Cursor	Windsurf	All Hands
Refactoring Automated code refactoring and improvements	Partial Basic refactoring	Full AI-powered edits	Full Smart transformations	Partial Collaborative improvements
Bug Fixing Automated bug detection and fixing	Full Advanced bug detection	Full Smart bug fixing	Full Automated fixes	None

Pr Review

emerging

/100

Automated code review and analysis of pull requests

Feature	GitHub Copilot	Cursor	Windsurf	All Hands
Automated Review AI-powered code review comments	Partial Through Copilot Chat	Partial Basic review	Full Comprehensive review	Full Team-focused review
Security Analysis Identifying security issues and vulnerabilities	Full Security features	None	Partial Basic security checks	Partial Basic security review

Agentic Programming

early

/100

Autonomous development and task planning capabilities

Feature	GitHub Copilot	Cursor	All Hands
Autonomous Development Independent code development capabilities	Partial Basic automation	Full Advanced autonomous coding	Full Full autonomous development
Task Planning Planning and organizing coding tasks	Partial Basic planning	Full Advanced task planning	Full Team-aware planning

Top Players

GitHub Copilot

GitHub/OpenAI

Real-time code suggestions
IDE integration
Context-aware completions

Cursor

Context-aware completions
AI-powered edits and refactoring
Basic code review capabilities

Windsurf

Windsurf AI

Context-aware suggestions
Smart code transformations
Automated code review

All Hands

Team-aware code suggestions
Collaborative code improvements
Team-focused code review

Multimodal Understanding

emerging

/100

Processing and understanding multiple types of input (text, images, audio)

Subsections

Vision Analysis

mature

/100

Understanding and analyzing visual content

Feature	Gemini Ultra	Gemini 1.5 Pro	GPT-4V	Claude 3 Opus
Image Understanding Detailed analysis of image content and context	Full Advanced multimodal	Full Advanced visual processing	Full High-res image analysis	Full Detailed visual comprehension
Ocr Capabilities Text extraction from images	None	Partial Basic text extraction	Full Advanced OCR	Full Mathematical notation

Audio Processing

emerging

/100

Processing and understanding audio input

Feature	Gemini Ultra	Gemini 1.5 Pro
Speech Recognition Converting speech to text with high accuracy	Full Advanced speech recognition	Full Multiple languages

Cross Modal Reasoning

emerging

/100

Understanding relationships between different types of input

Feature	Gemini 1.5 Pro	GPT-4V	Claude 3 Opus
Visual Reasoning Complex reasoning about visual information	Full Cross-modal understanding	Full Visual reasoning	Full Diagram analysis
Multimodal Generation Generating responses combining multiple modalities	Full Text and image generation	Partial Text-based responses only	Partial Text-based responses only

Top Players

Gemini Ultra

Google

91.8 MMLU score
Advanced multimodal
Cross-modal reasoning

Gemini 1.5 Pro

Google

1M token context
Video understanding
Advanced visual processing

GPT-4V

OpenAI

High-resolution image analysis
Visual reasoning with GPT-4 Turbo
Advanced OCR capabilities

Claude 3 Opus

Anthropic

200k token context
93.7% HumanEval score
Advanced visual analysis
Mathematical diagram comprehension

Create Issue