GPT-5.3 Codex: What’s New

OpenAI continues to develop the GPT-5 lineup. The new GPT-5.3 Codex is positioned as the company’s best model for programming tasks. According to OpenAI, the model shows significant improvements in:

  • Code generation across all popular languages
  • Debugging and refactoring existing codebases
  • Code explanation — the model better “sees” project architecture
  • Test generation — unit test generation has become noticeably more accurate

Benchmarks

Independent test results:

Test GPT-5.3 Codex Claude Opus 4.6 Claude Sonnet 4.6
SWE-Bench Verified 78.4% 76.1% 82.1%
HumanEval+ 95.8% 94.3% 96.2%
MBPP+ 88.2% 87.1% 89.7%
Codeforces Rating 1847 1792 1801

GPT-5.3 Codex confidently outperforms Claude Opus 4.6 in coding tasks but still falls short of Claude Sonnet 4.6 on SWE-Bench.

Key Improvements

Expanded Code Context

GPT-5.3 Codex features 128K tokens of context optimized for code files. OpenAI claims the model can hold the structure of a project with several hundred files in “memory.”

Improved Function Calling

For developers using the API, function calling has become more reliable. The model generates JSON call schemas more accurately and less frequently “invents” nonexistent parameters.

Codex Agent Mode

OpenAI introduced the Codex Agent mode, in which the model can:

  • Execute commands sequentially in a terminal
  • Read and modify files
  • Run tests and iterate on results

This is a direct response to Claude Code from Anthropic and similar agent products.

Pricing

GPT-5.3 Codex is available via API at the following prices:

  • Input: $8 / 1M tokens
  • Output: $24 / 1M tokens
  • Cached input: $2 / 1M tokens

This places the model in the mid-price segment — more expensive than DeepSeek but cheaper than Claude Opus.

What to Choose for Trading Bots?

For developers of algorithmic trading systems, the choice between GPT-5.3 and Claude depends on the task:

  • For writing strategies from scratch — Claude Sonnet 4.6 shows the best results
  • For integration with existing APIs — GPT-5.3 Codex wins due to accurate function calling
  • For market data analysis — both options work well, but GPT-5.3 is faster at streaming generation

Competition between models is only intensifying, and that’s great news for end users.