Research

Google's Gemini-SQL2 Leads Text-to-SQL Benchmarks

Google Research's Gemini-SQL2 achieved 80.04% execution accuracy on the BIRD benchmark, outperforming competitors like GPT-5.5-xhigh and Claude Opus 4.6.

A series of microscopes on a lab table, ideal for scientific research and education.

Photo: Vladimir Srajber / Pexels

Google Research has introduced Gemini-SQL2, a text-to-SQL system built on the Gemini 3.1 Pro model. This system translates natural language into executable SQL queries, with notable performance on the BIRD benchmark. According to Google, Gemini-SQL2 achieved an execution accuracy of 80.04%, placing it at the top of the leaderboard. This outperforms OpenAI's GPT-5.5-xhigh, which scored 72.8%, and Anthropic's Claude Opus 4.6, which scored 70.9%. Models from Databricks, AWS, Tencent, and Alibaba all scored significantly lower.

The performance highlights the model's ability to generate accurate and functional SQL queries, a task that is particularly challenging due to the complexity of data layers and business logic. The research team has not announced a public release of the model, and there is no paper available yet. Source: thedecoder

Key points

Google Research's Gemini-SQL2 achieved 80.04% execution accuracy on the BIRD benchmark.
Gemini-SQL2 outperforms OpenAI's GPT-5.5-xhigh with a score of 72.8%.
Anthropic's Claude Opus 4.6 scored 70.9% on the same benchmark.
Models from Databricks, AWS, Tencent, and Alibaba scored significantly lower than Gemini-SQL2.
Gemini-SQL2 leads the BIRD text-to-SQL leaderboard.
The research team has not announced a public release of the model.
There is no paper available for Gemini-SQL2 yet.

Source: The Decoder Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.