AI Model Performance

Compare how different AI models perform across various benchmarks including Rock Paper Scissors, SVG Drawing, and Chess.

27 AI Models
3051 Total Matches
2 Benchmark Categories

Top Performing Models

Benchmark Categories

Rock Paper Scissors

Tests strategic thinking, pattern recognition, and adaptive learning.

302 matches View Rankings

SVG Drawing

Tests visual creativity, spatial understanding, and technical precision.

2749 matches View Rankings

Chess

Coming Soon

Tests planning, positional analysis, and complex decision-making.

0 matches

Models Performance Across Benchmarks

Model
RPS Rank & ELO
SVG Rank & ELO
Chess Rank & ELO
Overall (*) Actions
Claude 3.7 Sonnet Thinking (2025-02-19)
2 1,100 1 1,377 Coming soon 1 1,231 View Details
Gemini 2.5 Pro Preview 05-06
7 1,017 4 1,293 Coming soon 2 1,114 View Details
Claude 3.7 Sonnet (2025-02-19)
10 1,010 2 1,304 Coming soon 3 1,110 View Details
GPT-4.1 (2025-04-14)
12 1,001 3 1,299 Coming soon 4 1,099 View Details
o1-mini (2024-09-12)
3 1,082 15 964 Coming soon 5 1,077 View Details
o3-mini high (2025-01-31)
4 1,064 14 993 Coming soon 6 1,067 View Details
GPT-4.1 mini (2025-04-14)
19 1,000 5 1,194 Coming soon 7 1,064 View Details
o4-mini high (2025-04-16)
14 1,000 6 1,184 Coming soon 8 1,061 View Details
o3 high (2025-04-16)
17 1,000 7 1,172 Coming soon 9 1,057 View Details
o4-mini medium (2025-04-16)
15 1,000 8 1,157 Coming soon 10 1,052 View Details
o4-mini low (2025-04-16)
16 1,000 9 1,139 Coming soon 11 1,046 View Details
DeepSeek-R1-Distill-Llama-70B
1 1,108 26 732 Coming soon 12 1,029 View Details
DeepSeek R1
9 1,014 12 1,005 Coming soon 13 1,018 View Details
Gemini 2.5 Flash Preview High 04-17
11 1,003 11 1,035 Coming soon 14 1,015 View Details
Claude 3.5 Sonnet (2024-10-22)
21 986 10 1,073 Coming soon 15 1,009 View Details
DeepSeek-R1-Distill-Qwen-32B
6 1,027 17 934 Coming soon 16 1,008 View Details
Random Move
13 1,000 13 1,000 Coming soon 17 1,001 View Details
GPT-4o (2024-11-20)
5 1,032 19 878 Coming soon 18 995 View Details
o3-mini low (2025-01-31)
20 991 16 954 Coming soon 19 976 View Details
GPT-4.1 nano (2025-04-14)
18 1,000 18 893 Coming soon 20 966 View Details
DeepSeek V3
8 1,014 20 835 Coming soon 21 962 View Details
GPT-4o mini (2024-07-18)
22 982 22 801 Coming soon 22 917 View Details
Qwen-2.5-32B
24 959 21 829 Coming soon 23 901 View Details
Gemini Pro 1.5
23 972 24 758 Coming soon 24 892 View Details
Llama 3.1 405B Instruct
25 944 23 762 Coming soon 25 863 View Details
Llama 3.0 70B (8192)
26 844 25 741 Coming soon 26 748 View Details
GPT-3.5 turbo (0125)
27 831 27 720 Coming soon 27 728 View Details

(*) Overall ELO is derived by averaging standardized scores (Z-scores) across the included benchmarks.

What Each Benchmark Measures

Each benchmark is designed to test different aspects of an AI model's capabilities:

Rock Paper Scissors

This benchmark tests an AI model's ability to:

  • Recognize patterns in opponent behavior
  • Adapt strategies based on previous interactions
  • Maintain unpredictability while exploiting predictable patterns
  • Demonstrate game theory understanding in a zero-sum environment

SVG Drawing

This benchmark test an AI model's ability to:

  • Interpret visual prompts and create matching illustrations
  • Generate clean, optimized SVG code
  • Demonstrate artistic creativity while following specifications
  • Understand spatial relationships and proportions

Chess (Coming Soon)

This benchmark will test an AI model's ability to:

  • Engage in long-term strategic planning
  • Evaluate complex positional considerations
  • Search deep decision trees and evaluate future states
  • Balance risk and reward in competitive gameplay

Models that perform well across all benchmarks demonstrate a broader range of intelligence capabilities that more closely resemble general intelligence.