Privacy Policy Banner

We use cookies to improve your experience. By continuing, you agree to our Privacy Policy.

Google Gemini vs Claude: Who will finish Pokémon first?

With the versions of Gemini and Claude, their test come. Some launch lives on Twitch in which these AI play the version of Pokémon. Gemini and Claude currently compete and the battle is tight.

How to compare two artificial intelligences? There are several benchmarks Serious that exists, like Lmarena where Gemini 2.5 Pro is first, or BrowseComp d’Openai. But a less boring measure also exists: Red and blue Pokémon. Gemini and Claude both try to beat Régis and become the Pokémon trainer.

Gemini and Claude compete on Pokémon By Lives interposed

As TechCrunch reported it, a tweet reacted week. He presents a live on Twitch in which we see Gemini playing Blue Pokémon. The « trainer Is in the city of Lavanville, which places him ahead of Claude 3.7 Sonnet, the AI ​​of Anthropic who also plays to Pokémon.

Gemini who plays Pokémon // Source: Numerama screenshotGemini who plays Pokémon // Source: Numerama screenshot
Gemini who plays Pokémon // Source: Numerama screenshot

In less time than Claude, Gemini therefore reached Lavanville: the IA of Anthropic is still blocked at Mont Sélénite. Unfortunately, it can no longer move forward. On the other hand, Gemini has a little help, as noted by some Internet users on Reddit. The developer behind the live ” Gemini Plays Pokémon »Give him a kind of personalized card to help the LLM interactive objects in the game, like the trees to be cut. This means that he does not need to analyze a lot of screenshots before making a decision. Claude also has a kind of card available, to show him the places where her character can walk.

Claude playsClaude plays
Claude plays “Pokémon” live on Twitch // Source: Numerama

In addition, Claude suffers from a bug: when it is by bike, each press on a button advances two boxes instead of one. With an AI already slow and that cannot understand this, the game is further.

The best AI is not the one that has a Dracaufeu Level 100

The demonstration of these two AIs in Red and blue Pokémon is not technical: it is above all a question of showing the uses that can be made of it. Everyone knows about how the game works, which makes the demonstration accessible. OPENAI, Google, Microsoft, or Deepseek do not speak of the games of Pokémon What are capable of carrying out their tools. Anthropic is one of the only companies to have done with Claude 3.7 Sonnet.

The game boxes were cut to help Claude // Source: AnthropicThe game boxes were cut to help Claude // Source: Anthropic
The game boxes were cut to help Claude // Source: Anthropic

It also shows that as a function of benchmark Chosen, AI rankings can be different. We know for example that Meta has developed a version of Llama 4 specially designed to obtain good scores on LMARENA (the basic version of LLAMA 4 is not however optimized for). In the case of PokémonClaude and Gemini do not play on equal arms, which also shows that there is no benchmark standardized with Pokémon. Unless you accelerate the game, it can take thousands of before one of them finally battles.

All tech news in a clien of an eye

All tech in the blink of an eye

Add Numerama to your home screen and stay connected to the future!

Logo beans
Installer Numerama

-

PREV A drama in the Mediterranean: research abandoned for a sailor who has disappeared from Charles-de-Gaulle
NEXT After two years of marriage, Archduke Alexander of Austria and separate Natacha?