OpenAI has introduced a new method called Deployment Simulation to better predict how its GPT-5 series models might behave in real-world scenarios before release. The technique involves replaying previous conversations with a candidate model to study its responses in realistic contexts. This allows the company to identify potential risks and undesired behaviors that could emerge once the model is deployed to users. Deployment Simulation aims to enhance the safety review process by providing a more accurate preview of model behavior in deployment-like settings.
The method works by taking recent conversations from deployment, removing the original assistant response, and regenerating it with a new candidate model. This enables the evaluation of completions for new failure modes and provides estimates of undesired behavior frequency based on a deployment-like distribution. OpenAI noted that Deployment Simulation addresses key limitations of traditional evaluations, such as coverage and selection biases, by using a representative sample of recent usage. It also mitigates concerns about models recognizing they are being tested, as the simulated conversations closely resemble real deployment traffic.
According to OpenAI, Deployment Simulation has already been used during model development to identify blind spots in traditional evaluations and inform deployment decisions. The company also applied the method to complex agent settings involving tool use, demonstrating its versatility beyond standard chat scenarios. The technique is expected to play a larger role in future model development as the pipeline becomes easier to run.
Source: openai