AI Model Performance
Compare how different AI models perform across various benchmarks including Rock Paper Scissors, SVG Drawing, and Chess.
Top Performing Models
Benchmark Categories
Rock Paper Scissors
Tests strategic thinking, pattern recognition, and adaptive learning.
SVG Drawing
Tests visual creativity, spatial understanding, and technical precision.
Chess
Coming SoonTests planning, positional analysis, and complex decision-making.
Models Performance Across Benchmarks
Model |
RPS Rank & ELO
|
SVG Rank & ELO
|
Chess Rank & ELO
|
Overall (*) | Actions |
---|---|---|---|---|---|
Claude 3.7 Sonnet Thinking (2025-02-19)
|
2 1,100 | 1 1,377 | Coming soon | 1 1,231 | View Details |
Gemini 2.5 Pro Preview 05-06
|
7 1,017 | 4 1,293 | Coming soon | 2 1,114 | View Details |
Claude 3.7 Sonnet (2025-02-19)
|
10 1,010 | 2 1,304 | Coming soon | 3 1,110 | View Details |
GPT-4.1 (2025-04-14)
|
12 1,001 | 3 1,299 | Coming soon | 4 1,099 | View Details |
o1-mini (2024-09-12)
|
3 1,082 | 15 964 | Coming soon | 5 1,077 | View Details |
o3-mini high (2025-01-31)
|
4 1,064 | 14 993 | Coming soon | 6 1,067 | View Details |
GPT-4.1 mini (2025-04-14)
|
19 1,000 | 5 1,194 | Coming soon | 7 1,064 | View Details |
o4-mini high (2025-04-16)
|
14 1,000 | 6 1,184 | Coming soon | 8 1,061 | View Details |
o3 high (2025-04-16)
|
17 1,000 | 7 1,172 | Coming soon | 9 1,057 | View Details |
o4-mini medium (2025-04-16)
|
15 1,000 | 8 1,157 | Coming soon | 10 1,052 | View Details |
o4-mini low (2025-04-16)
|
16 1,000 | 9 1,139 | Coming soon | 11 1,046 | View Details |
DeepSeek-R1-Distill-Llama-70B
|
1 1,108 | 26 732 | Coming soon | 12 1,029 | View Details |
DeepSeek R1
|
9 1,014 | 12 1,005 | Coming soon | 13 1,018 | View Details |
Gemini 2.5 Flash Preview High 04-17
|
11 1,003 | 11 1,035 | Coming soon | 14 1,015 | View Details |
Claude 3.5 Sonnet (2024-10-22)
|
21 986 | 10 1,073 | Coming soon | 15 1,009 | View Details |
DeepSeek-R1-Distill-Qwen-32B
|
6 1,027 | 17 934 | Coming soon | 16 1,008 | View Details |
Random Move
|
13 1,000 | 13 1,000 | Coming soon | 17 1,001 | View Details |
GPT-4o (2024-11-20)
|
5 1,032 | 19 878 | Coming soon | 18 995 | View Details |
o3-mini low (2025-01-31)
|
20 991 | 16 954 | Coming soon | 19 976 | View Details |
GPT-4.1 nano (2025-04-14)
|
18 1,000 | 18 893 | Coming soon | 20 966 | View Details |
DeepSeek V3
|
8 1,014 | 20 835 | Coming soon | 21 962 | View Details |
GPT-4o mini (2024-07-18)
|
22 982 | 22 801 | Coming soon | 22 917 | View Details |
Qwen-2.5-32B
|
24 959 | 21 829 | Coming soon | 23 901 | View Details |
Gemini Pro 1.5
|
23 972 | 24 758 | Coming soon | 24 892 | View Details |
Llama 3.1 405B Instruct
|
25 944 | 23 762 | Coming soon | 25 863 | View Details |
Llama 3.0 70B (8192)
|
26 844 | 25 741 | Coming soon | 26 748 | View Details |
GPT-3.5 turbo (0125)
|
27 831 | 27 720 | Coming soon | 27 728 | View Details |
(*) Overall ELO is derived by averaging standardized scores (Z-scores) across the included benchmarks.
What Each Benchmark Measures
Each benchmark is designed to test different aspects of an AI model's capabilities:
Rock Paper Scissors
This benchmark tests an AI model's ability to:
- Recognize patterns in opponent behavior
- Adapt strategies based on previous interactions
- Maintain unpredictability while exploiting predictable patterns
- Demonstrate game theory understanding in a zero-sum environment
SVG Drawing
This benchmark test an AI model's ability to:
- Interpret visual prompts and create matching illustrations
- Generate clean, optimized SVG code
- Demonstrate artistic creativity while following specifications
- Understand spatial relationships and proportions
Chess (Coming Soon)
This benchmark will test an AI model's ability to:
- Engage in long-term strategic planning
- Evaluate complex positional considerations
- Search deep decision trees and evaluate future states
- Balance risk and reward in competitive gameplay
Models that perform well across all benchmarks demonstrate a broader range of intelligence capabilities that more closely resemble general intelligence.