Research

Tencent Researchers Argue AI Must Finish Tasks, Not Just Answer Questions

A new survey paper by Tencent's Youtu Lab and Chinese universities argues AI systems must shift from generating answers to completing tasks reliably, with a focus on reusable skills and persistent work environments.

Scientist in gloves analyzing blue liquid in a laboratory setting with microscope and glassware.

Photo: Chokniti Khongchum / Pexels

A new survey paper by Tencent's Youtu Lab and several Chinese universities argues that AI systems will not become reliable coworkers until they shift from generating answers to completing tasks reliably. The researchers emphasize the need for reusable 'skills' and persistent work environments to enable real-world task execution. The paper outlines a transition from chatbots to digital colleagues, focusing on how models can reliably turn intent into finished work instead of just producing answers.

The study traces the evolution of large language models through five stages, from basic chatbots to autonomous digital colleagues. In the chatbot era, models generated text quickly by following the most likely continuation without checking intermediate steps or searching for solutions. In contrast, thinking large language models (LLMs) invest extra compute at inference time, exploring solution paths, verifying intermediate steps, and correcting errors before producing final answers. This shift is framed as a move from fast, intuitive 'System 1' thinking to slow, deliberate 'System 2' reasoning, based on psychologist Daniel Kahneman's framework.

The researchers highlight four structural bottlenecks in first-generation agents: limited environmental perception, lack of lasting state from tool calls, unexpected behavior, and infrequent task completion. In the OpenClaw era, models operate in persistent, secure workspaces with files, terminals, reusable skills, and verification loops until verifiable completion. The paper cites OpenHands and SWE-agent as examples of agents embedded in controlled development environments.

Source: thedecoder

Key points

A new survey paper by Tencent's Youtu Lab and Chinese universities argues AI systems must shift from generating answers to completing tasks reliably.
The paper outlines a transition from chatbots to digital colleagues, focusing on how models can reliably turn intent into finished work instead of just producing answers.
In the chatbot era, models generated text quickly by following the most likely continuation without checking intermediate steps or searching for solutions.
The researchers highlight four structural bottlenecks in first-generation agents: limited environmental perception, lack of lasting state from tool calls, unexpected behavior, and infrequent task completion.
In the OpenClaw era, models operate in persistent, secure workspaces with files, terminals, reusable skills, and verification loops until verifiable completion.
The paper cites OpenHands and SWE-agent as examples of agents embedded in controlled development environments.

Source: The Decoder Read the original →

WRITTEN BY

Maya Chen

AI Research & Breakthroughs

Maya breaks down the latest AI research papers, benchmarks, and technical breakthroughs into plain language.