AI Models Performance

AI Model Performance

Compare how different AI models perform across various benchmarks including Rock Paper Scissors, SVG Drawing, and Chess.

31 AI Models

3392 Total Matches

2 Benchmark Categories

Benchmark Categories

Rock Paper Scissors

Tests strategic thinking, pattern recognition, and adaptive learning.

571 matches View Rankings

SVG Drawing

Tests visual creativity, spatial understanding, and technical precision.

2821 matches View Rankings

Chess

Coming Soon

Tests planning, positional analysis, and complex decision-making.

0 matches

Models Performance Across Benchmarks

Model	RPS Rank & ELO	SVG Rank & ELO	Chess Rank & ELO	Overall (*)	Actions
Claude 3.7 Sonnet Thinking (2025-02-19)	2 1,117	1 1,380	Coming soon	1 1,237	View Details
Gemini 2.5 Pro Preview 05-06	7 1,053	3 1,278	Coming soon	2 1,145	View Details
o4-mini low (2025-04-16)	4 1,096	10 1,119	Coming soon	3 1,128	View Details
o3-mini high (2025-01-31)	3 1,103	16 1,056	Coming soon	4 1,113	View Details
Claude Sonnet 4 Thinking (2025-05-14)	13 1,028	5 1,195	Coming soon	5 1,093	View Details
o3 high (2025-04-16)	11 1,031	7 1,175	Coming soon	6 1,089	View Details
GPT-4.1 (2025-04-14)	21 991	2 1,279	Coming soon	7 1,088	View Details
Claude Sonnet 4 (2025-05-14)	6 1,054	11 1,099	Coming soon	8 1,083	View Details
o4-mini medium (2025-04-16)	8 1,041	9 1,127	Coming soon	9 1,081	View Details
Claude 3.7 Sonnet (2025-02-19)	22 990	4 1,256	Coming soon	10 1,080	View Details
Claude Opus 4 Thinking (2025-05-14)	10 1,034	14 1,072	Coming soon	11 1,056	View Details
Claude 3.5 Sonnet (2024-10-22)	9 1,037	15 1,059	Coming soon	12 1,054	View Details
o4-mini high (2025-04-16)	18 1,001	8 1,144	Coming soon	13 1,051	View Details
o3-mini low (2025-01-31)	16 1,024	13 1,075	Coming soon	14 1,047	View Details
o1-mini (2024-09-12)	5 1,070	21 936	Coming soon	15 1,041	View Details
Claude Opus 4 (2025-05-14)	17 1,002	12 1,093	Coming soon	16 1,034	View Details
DeepSeek R1	15 1,026	19 995	Coming soon	17 1,022	View Details
Gemini 2.5 Flash Preview High 04-17	12 1,031	20 968	Coming soon	18 1,017	View Details
GPT-4.1 mini (2025-04-14)	26 946	6 1,180	Coming soon	19 1,013	View Details
DeepSeek-R1-Distill-Qwen-32B	14 1,027	22 934	Coming soon	20 1,002	View Details
GPT-4o (2024-11-20)	20 993	17 1,022	Coming soon	21 1,001	View Details
Random Move	19 1,000	18 1,000	Coming soon	22 1,000	View Details
DeepSeek-R1-Distill-Llama-70B	1 1,123	30 666	Coming soon	23 996	View Details
Qwen-2.5-32B	25 959	24 829	Coming soon	24 904	View Details
GPT-4.1 nano (2025-04-14)	27 938	23 875	Coming soon	25 901	View Details
Gemini Pro 1.5	23 985	27 752	Coming soon	26 901	View Details
GPT-3.5 turbo (0125)	24 965	28 727	Coming soon	27 875	View Details
DeepSeek V3	28 910	25 818	Coming soon	28 855	View Details
GPT-4o mini (2024-07-18)	30 852	26 794	Coming soon	29 795	View Details
Llama 3.1 405B Instruct	29 858	29 700	Coming soon	30 768	View Details
Llama 3.0 70B (8192)	31 805	31 654	Coming soon	31 704	View Details

Model

RPS Rank & ELO

SVG Rank & ELO

Chess Rank & ELO

Overall (*)

Claude 3.7 Sonnet Thinking (2025-02-19)

2 1,117

1 1,380

Coming soon

1 1,237

View Details

Gemini 2.5 Pro Preview 05-06

7 1,053

3 1,278

Coming soon

2 1,145

View Details

o4-mini low (2025-04-16)

4 1,096

10 1,119

Coming soon

3 1,128

View Details

o3-mini high (2025-01-31)

3 1,103

16 1,056

Coming soon

4 1,113

View Details

Claude Sonnet 4 Thinking (2025-05-14)

13 1,028

5 1,195

Coming soon

5 1,093

View Details

o3 high (2025-04-16)

11 1,031

7 1,175

Coming soon

6 1,089

View Details

GPT-4.1 (2025-04-14)

21 991

2 1,279

Coming soon

7 1,088

View Details

Claude Sonnet 4 (2025-05-14)

6 1,054

11 1,099

Coming soon

8 1,083

View Details

o4-mini medium (2025-04-16)

8 1,041

9 1,127

Coming soon

9 1,081

View Details

Claude 3.7 Sonnet (2025-02-19)

22 990

4 1,256

Coming soon

10 1,080

View Details

Claude Opus 4 Thinking (2025-05-14)

10 1,034

14 1,072

Coming soon

11 1,056

View Details

Claude 3.5 Sonnet (2024-10-22)

9 1,037

15 1,059

Coming soon

12 1,054

View Details

o4-mini high (2025-04-16)

18 1,001

8 1,144

Coming soon

13 1,051

View Details

o3-mini low (2025-01-31)

16 1,024

13 1,075

Coming soon

14 1,047

View Details

o1-mini (2024-09-12)

5 1,070

21 936

Coming soon

15 1,041

View Details

Claude Opus 4 (2025-05-14)

17 1,002

12 1,093

Coming soon

16 1,034

View Details

DeepSeek R1

15 1,026

19 995

Coming soon

17 1,022

View Details

Gemini 2.5 Flash Preview High 04-17

12 1,031

20 968

Coming soon

18 1,017

View Details

GPT-4.1 mini (2025-04-14)

26 946

6 1,180

Coming soon

19 1,013

View Details

DeepSeek-R1-Distill-Qwen-32B

14 1,027

22 934

Coming soon

20 1,002

View Details

GPT-4o (2024-11-20)

20 993

17 1,022

Coming soon

21 1,001

View Details

Random Move

19 1,000

18 1,000

Coming soon

22 1,000

View Details

DeepSeek-R1-Distill-Llama-70B

1 1,123

30 666

Coming soon

23 996

View Details

Qwen-2.5-32B

25 959

24 829

Coming soon

24 904

View Details

GPT-4.1 nano (2025-04-14)

27 938

23 875

Coming soon

25 901

View Details

Gemini Pro 1.5

23 985

27 752

Coming soon

26 901

View Details

GPT-3.5 turbo (0125)

24 965

28 727

Coming soon

27 875

View Details

DeepSeek V3

28 910

25 818

Coming soon

28 855

View Details

GPT-4o mini (2024-07-18)

30 852

26 794

Coming soon

29 795

View Details

Llama 3.1 405B Instruct

29 858

29 700

Coming soon

30 768

View Details

Llama 3.0 70B (8192)

31 805

31 654

Coming soon

31 704

View Details

(*) Overall ELO is derived by averaging standardized scores (Z-scores) across the included benchmarks.

What Each Benchmark Measures

Each benchmark is designed to test different aspects of an AI model's capabilities:

Rock Paper Scissors

This benchmark tests an AI model's ability to:

Recognize patterns in opponent behavior
Adapt strategies based on previous interactions
Maintain unpredictability while exploiting predictable patterns
Demonstrate game theory understanding in a zero-sum environment

SVG Drawing

This benchmark test an AI model's ability to:

Interpret visual prompts and create matching illustrations
Generate clean, optimized SVG code
Demonstrate artistic creativity while following specifications
Understand spatial relationships and proportions

Chess (Coming Soon)

This benchmark will test an AI model's ability to:

Engage in long-term strategic planning
Evaluate complex positional considerations
Search deep decision trees and evaluate future states
Balance risk and reward in competitive gameplay

Models that perform well across all benchmarks demonstrate a broader range of intelligence capabilities that more closely resemble general intelligence.

AI Model Performance

Top Performing Models

Claude 3.7 Sonnet Thinking (2025-02-19)

Gemini 2.5 Pro Preview 05-06

o4-mini low (2025-04-16)

Benchmark Categories

Rock Paper Scissors

SVG Drawing

Chess

Models Performance Across Benchmarks

What Each Benchmark Measures

Rock Paper Scissors

SVG Drawing

Chess (Coming Soon)