With the new versions of Gemini and Claude, their test come. Some launch lives on Twitch in which these AI play the first version of Pokémon. Gemini and Claude currently compete and the battle is tight.
How to compare two artificial intelligences? There are several benchmarks Serious that exists, like Lmarena where Gemini 2.5 Pro is first, or BrowseComp d’Openai. But a less boring measure also exists: Red and blue Pokémon. Gemini and Claude both try to beat Régis and become the best Pokémon trainer.
Gemini and Claude compete on Pokémon By Lives interposed
As TechCrunch reported it, a tweet reacted last week. He presents a live on Twitch in which we see Gemini playing Blue Pokémon. The « trainer Is in the city of Lavanville, which places him ahead of Claude 3.7 Sonnet, the AI of Anthropic who also plays to Pokémon.


In less time than Claude, Gemini therefore reached Lavanville: the IA of Anthropic is still blocked at Mont Sélénite. Unfortunately, it can no longer move forward. On the other hand, Gemini has a little help, as noted by some Internet users on Reddit. The developer behind the live ” Gemini Plays Pokémon »Give him a kind of personalized card to help the LLM identify interactive objects in the game, like the trees to be cut. This means that he does not need to analyze a lot of screenshots before making a decision. Claude also has a kind of card available, to show him the places where her character can walk.


In addition, Claude suffers from a bug: when it is by bike, each press on a button advances two boxes instead of one. With an AI already slow and that cannot understand this, the game is further.
The best AI is not the one that has a Dracaufeu Level 100
The demonstration of these two AIs in Red and blue Pokémon is not technical: it is above all a question of showing the uses that can be made of it. Everyone knows about how the game works, which makes the demonstration accessible. OPENAI, Google, Microsoft, or Deepseek do not speak of the games of Pokémon What are capable of carrying out their tools. Anthropic is one of the only companies to have done with Claude 3.7 Sonnet.


It also shows that as a function of benchmark Chosen, AI rankings can be different. We know for example that Meta has developed a version of Llama 4 specially designed to obtain good scores on LMARENA (the basic version of LLAMA 4 is not however optimized for). In the case of PokémonClaude and Gemini do not play on equal arms, which also shows that there is no benchmark standardized with Pokémon. Unless you accelerate the game, it can take thousands of hours before one of them finally battles.

All tech news in the blink of an eye
Add Numerama to your home screen and stay connected to the future!