In a significant announcement today, Google introduced the experimental Gemini 2.5 Pro model, specifically designed for Advanced subscribers and developers. This release marks another pivotal mid-year update in Google's ongoing commitment to enhancing its AI capabilities. The entire Gemini 2.5 family, including future iterations, is engineered as "thinking models" that can reason through their responses, leading to improved performance and accuracy.
Google emphasizes that it is integrating advanced thinking capabilities directly into all of its models, enabling them to tackle more intricate problems and support context-aware agents. Unlike the previous version, Gemini 2.0 Flash Thinking, which was first revealed in December and updated recently, the "Thinking" label is no longer explicitly attached. Users can still opt to "Show thinking" in the Gemini app to view the model's reasoning process.
In the realm of artificial intelligence, a system's capacity for reasoning extends beyond mere classification and prediction. It involves analyzing information, drawing logical conclusions, incorporating context and nuance, and making informed decisions. The Gemini 2.5 model enhances performance by combining a significantly upgraded base model with improved post-training capabilities.
The Gemini 2.5 Pro, codenamed "nebula," is the inaugural model in this new family, specifically tailored for complex tasks. Google highlights its impressive performance, noting that it leads the LMArena leaderboard, which assesses human preferences, by a substantial margin. The model excels in various benchmarks, including math (AIME 2025) and science (GPQA diamond), achieving remarkable scores without relying on costly test-time techniques such as majority voting.
Moreover, Gemini 2.5 Pro has scored an exceptional 18.8% on Humanity’s Last Exam, a dataset crafted by hundreds of experts to assess the human frontier of knowledge and reasoning. This model also focuses on advanced coding capabilities, achieving a significant leap over its predecessor, Gemini 2.0, with more enhancements expected in the future.
According to Google, Gemini 2.5 Pro excels at generating visually compelling web applications and agentic code applications, as well as code transformation and editing. In terms of standardized evaluations, it scored 63.8% on the SWE-Bench Verified benchmark, which is recognized as the industry standard for agentic code evaluations.
In addition to its native multimodality, the Gemini 2.5 Pro boasts a remarkable 1 million token context window, with plans to expand to 2 million tokens soon. This capability allows it to process vast datasets and address complex problems from various information sources, including text, audio, images, video, and entire code repositories.
The experimental Gemini 2.5 Pro is currently rolling out to Gemini Advanced users and the Google AI Studio, with the Vertex AI platform expected to follow suit in the coming weeks. Google is also set to announce pricing shortly, allowing users to access Gemini 2.5 Pro with higher rate limits tailored for scaled production use.