Research

AI Search Agents Rely on Memory, Not Web Research

A study reveals leading AI search agents often confirm existing knowledge rather than research the web, with models like GPT-5.4 and Kimi-K2.6 scoring high on static benchmarks.

Image: The Decoder

A new study highlights a critical limitation in AI search agents, showing they frequently rely on internal knowledge rather than actively researching the web. Researchers from the Harbin Institute of Technology and Xiaohongshu conducted tests on eleven models, revealing that many perform well on static benchmarks like BrowseComp without accessing external sources. MiniMax M2.5 solved 44.5 percent of tasks from memory alone, while Kimi K2.6 achieved 62 percent on the Chinese variant of the benchmark. These results suggest a significant portion of benchmark performance stems from pre-existing knowledge rather than real-time web research. Source: thedecoder

When researchers removed all answer-supporting documents from the search index, model performance dropped sharply. MiniMax M2.5’s score fell from 44.5 to 8.0 percent, and Kimi-K2.6’s dropped from 25.5 to 2.3 percent. This indicates that search tools can hinder agents by pulling them away from correct answers. The study also found that more than half of all queries originated from the models’ own reasoning rather than from previously found hits. Agents often fail to incorporate relevant evidence into their reasoning, using it less than a third of the time. Source: thedecoder

To assess real search behavior, the researchers developed LiveBrowseComp, a benchmark with 335 human-written questions that require up-to-date information from the past 90 days. This benchmark filters out commonly known facts, focusing instead on obscure but verifiable details. Human testers solved a similar number of tasks as on BrowseComp, suggesting the performance drop among models is due to the absence of memory shortcuts rather than the difficulty of the questions. On LiveBrowseComp, all models in the closed-book test scored below two percent accuracy, confirming the limitations of relying on static knowledge. Source: thedecoder

Key points

Leading AI search agents often confirm existing knowledge rather than research the web.
Models like GPT-5.4 and Kimi-K2.6 scored high on static benchmarks like BrowseComp.
MiniMax M2.5 solved 44.5 percent of BrowseComp tasks from memory alone.
Kimi K2.6 achieved 62 percent on the Chinese variant of the benchmark.
Removing answer-supporting documents from the search index caused model performance to drop sharply.
More than half of all queries originated from the models’ own reasoning rather than from previously found hits.
Agents often fail to incorporate relevant evidence into their reasoning, using it less than a third of the time.

Source: The Decoder Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.

AI Search Agents Rely on Memory, Not Web Research

Key points

Related articles

Hugging Face Evaluates Open-Source AI Models for Swiss Legal Tasks

Anthropic Discovers New Internal Space in AI Models

AMD Optimizes Video Sparse Attention on ROCm

Anthropic's J-Space Explores Consciousness in Claude