GPT-5.3 Codex: Whatโ€™s New

OpenAI continues to develop the GPT-5 lineup. The new GPT-5.3 Codex is positioned as the companyโ€™s best model for programming tasks. According to OpenAI, the model shows significant improvements in:

  • Code generation across all popular languages
  • Debugging and refactoring existing codebases
  • Code explanation โ€” the model better โ€œseesโ€ project architecture
  • Test generation โ€” unit test generation has become noticeably more accurate

Benchmarks

Independent test results:

Test GPT-5.3 Codex Claude Opus 4.6 Claude Sonnet 4.6
SWE-Bench Verified 78.4% 76.1% 82.1%
HumanEval+ 95.8% 94.3% 96.2%
MBPP+ 88.2% 87.1% 89.7%
Codeforces Rating 1847 1792 1801

GPT-5.3 Codex confidently outperforms Claude Opus 4.6 in coding tasks but still falls short of Claude Sonnet 4.6 on SWE-Bench.

Key Improvements

Expanded Code Context

GPT-5.3 Codex features 128K tokens of context optimized for code files. OpenAI claims the model can hold the structure of a project with several hundred files in โ€œmemory.โ€

Improved Function Calling

For developers using the API, function calling has become more reliable. The model generates JSON call schemas more accurately and less frequently โ€œinventsโ€ nonexistent parameters.

Codex Agent Mode

OpenAI introduced the Codex Agent mode, in which the model can:

  • Execute commands sequentially in a terminal
  • Read and modify files
  • Run tests and iterate on results

This is a direct response to Claude Code from Anthropic and similar agent products.

Pricing

GPT-5.3 Codex is available via API at the following prices:

  • Input: $8 / 1M tokens
  • Output: $24 / 1M tokens
  • Cached input: $2 / 1M tokens

This places the model in the mid-price segment โ€” more expensive than DeepSeek but cheaper than Claude Opus.

What to Choose for Trading Bots?

For developers of algorithmic trading systems, the choice between GPT-5.3 and Claude depends on the task:

  • For writing strategies from scratch โ€” Claude Sonnet 4.6 shows the best results
  • For integration with existing APIs โ€” GPT-5.3 Codex wins due to accurate function calling
  • For market data analysis โ€” both options work well, but GPT-5.3 is faster at streaming generation

Competition between models is only intensifying, and thatโ€™s great news for end users.