OpenAI introduces the GPT 5.1 Codex Max coding model and has already completed a 24-hour task internally

OpenAI has Introducing GPT‑5.1 Codex Maxa new Frontier Agent coding model now available in its Codex developer environment. The release represents a significant advance in AI-powered software development, offering improved long-term reasoning, efficiency and real-time interactive capabilities. GPT-5.1-Codex-Max now replaces GPT-5.1-Codex as the default model on all Codex-integrated surfaces.

The new model is intended to serve as a persistent, context-aware software development agent capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows.

It follows on the heels of Google releases its powerful new model Gemini 3 Pro yesterday, but still outperforms or matches it on key coding benchmarks:

To SWE Bench verified, GPT‑5.1 Codex Max achieved 77.9% accuracy with particularly high mental effort, it exceeded the 76.2% of the Gemini 3 Pro.

It also led further Terminal Bench 2.0 with an accuracy of 58.1% versus 54.2% for Gemini. and achieved Gemini’s score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark.

Judging by the Gemini 3 Pro’s most advanced configuration – the Deep Thinking model – Codex-Max also has a slight lead in the agent coding benchmarks.

Performance benchmarks: Incremental gains on key tasks

GPT-5.1-Codex-Max demonstrates measurable improvements over GPT-5.1-Codex in a number of standard software engineering benchmarks.

An accuracy of 79.9% was achieved on SWE-Lancer IC SWE, a significant increase from 66.3% on GPT-5.1 Codex. In the SWE Bench Verified (n=500) it achieved an accuracy of 77.9% with a particularly high level of argumentation effort, thus exceeding the 73.7% of GPT 5.1 Codex.

Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT-5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT-5.1-Codex.

All evaluations were carried out with compression activated and a particularly high level of argumentation effort.

These results suggest that the new model provides a higher upper bound on both benchmark correctness and practical usability under extended reasoning loads.

Technical Architecture: Long-Horizon Reasoning through Compression

A key architectural improvement in GPT 5.1 Codex Max is its ability to use a mechanism called ” compression.

This allows the model to retain important contextual information while discarding irrelevant details as it approaches its context window limit – effectively enabling continuous work across millions of tokens without sacrificing performance.

The model has been internally observed to complete tasks that take more than 24 hours, including multi-stage refactors, test-driven iteration, and autonomous debugging.

Compression also improves token efficiency. At medium reasoning effort, GPT-5.1-Codex-Max used approximately 30% fewer reasoning tokens than GPT-5.1-Codex for comparable or better accuracy, which has implications for both cost and latency.

Platform integration and use cases

GPT‑5.1-Codex-Max is currently available in several Codex-based environments that reference OpenAI’s own built-in tools and interfaces designed specifically for code-focused AI agents. This includes:

Codex CLIthe official command line tool from OpenAI (@openai/codex), where GPT‑5.1-Codex-Max is already live.
IDE extensionslikely developed or maintained by OpenAI, although no specific third-party IDE integrations were mentioned.
Interactive coding environmentsfor example, those used to demonstrate front-end simulation apps such as CartPole or Snell’s Law Explorer.
Internal code review toolsis used by OpenAI’s engineering teams.

Currently, GPT-5.1-Codex-Max is not yet available via the public API, although OpenAI states that it will be available soon. Users who want to work with the model in terminal environments today can do so by installing and using the Codex CLI.

It is currently unconfirmed if or how the model will be integrated into third-party IDEs, unless they are based on the CLI or a future API.

The model is able to interact with live tools and simulations. Examples shown in the press release include:

An interactive CartPole policy gradient simulator that visualizes training and activation of reinforcement learning.
A Snell’s Law optics explorer supporting dynamic ray tracing across refractive indices.

These interfaces demonstrate the model’s ability to reason in real time while maintaining an interactive development session – effectively connecting computation, visualization, and implementation in a single loop.

Cybersecurity and Security Restrictions

Although GPT-5.1-Codex-Max does not meet OpenAI’s “high” cybersecurity capability threshold under its Preparedness Framework, it is currently the most capable cybersecurity model that OpenAI has deployed. It supports use cases such as automatic vulnerability detection and remediation, but with strict sandboxing and network access disabled by default.

OpenAI does not report an increase in scaled malicious usage, but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated in a local workspace unless developers opt for broader access, mitigating risks such as immediate untrusted content injection.

Deployment context and developer usage

GPT‑5.1-Codex-Max is currently available to users ChatGPT Plus, Pro, Business, Edu and Enterprise Plans. It will also become the new standard in Codex-based environments, replacing GPT 5.1 Codex, which was a more general model.

OpenAI says 95% of its internal engineers use Codex weekly, and since launch, those engineers have sent about 70% more pull requests on average – underscoring the tool’s impact on internal development speed.

Despite its autonomy and durability, OpenAI emphasizes that Codex-Max should be treated as a coding assistant and not as a replacement for human review. The model creates terminal logs, test quotes, and tool call outputs to support transparency in the generated code.

outlook

GPT-5.1-Codex-Max represents a significant evolution of the OpenAI strategy towards agent development tools, providing greater depth of reasoning, token efficiency and interactive capabilities in all software development tasks. By extending its context management and compression strategies, the model is able to handle tasks at the scale of entire repositories rather than individual files or snippets.

With a continued focus on agent workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max is setting the stage for the next generation of AI-powered programming environments – while underscoring the importance of oversight in increasingly autonomous systems.