Research
Review Paper Argues Code Is How AI Agents Think and Act
A new review paper by Meta, Stanford, and the University of Illinois Urbana-Champaign highlights code as the foundation for AI agents’ reasoning and actions, citing real-world examples like Claude Code and Codex.
A new review paper from researchers at the University of Illinois Urbana-Champaign, Meta, and Stanford argues that code is the foundation on which AI agents reason, act, and coordinate. The paper emphasizes that the real bottleneck for autonomous systems lies in the software layer surrounding the model, referred to as the 'harness.' This layer includes tools, interfaces, sandboxed environments, memory, testing, permission boundaries, execution loops, and feedback channels. Without it, a language model remains stateless. With it, the model becomes a working agent capable of executing tasks over extended periods. The paper outlines how code serves as an executable, testable, and stateful layer between the model and its environment, enabling reliable and continuous operation. It also describes three layers organizing the field: the model's capabilities, the infrastructure, and the code the agent writes on the fly, including test scripts, helper tools, and reusable skills. Commercial systems like Claude Code and OpenAI's Codex already operate on this principle, but the authors warn that current software tests are often incomplete, potentially obscuring risks. They stress the need for more transparent evaluation mechanisms. *Source: [thedecoder](https://the-decoder.com/new-review-paper-argues-code-is-how-ai-agents-think-and-act-not-just-what-they-produce/)*
Key points
- A review by Meta, Stanford, and the University of Illinois Urbana-Champaign finds that code increasingly serves as the foundation on which AI agents reason, act, and coordinate with each other.
- The authors call this layer the 'harness,' which includes tools, interfaces, sandboxed environments, memory, testing, permission boundaries, execution loops, and feedback channels.
- Commercial systems like Claude Code and OpenAI's Codex already operate on this principle, but the authors caution against misplaced trust due to incomplete software tests that can obscure risks.
- The paper splits long-running agent systems into three parts: the model's own capabilities, the infrastructure the system provides, and the code the agent writes on the fly, including test scripts and reusable skills.
- The authors say self-generated artifacts like test scripts and helper tools haven't received nearly enough research attention.
- The paper highlights that code acts as an executable, testable, and stateful layer between the model and its environment, enabling reliable and continuous operation.
- The line between agent and environment is becoming a layer that learns, showing up across five domains from coding assistants to robotics.