GPT-5.3 Codex: OpenAI Updates Its Flagship Model
GPT-5.3 Codex: Whatโs New
OpenAI continues to develop the GPT-5 lineup. The new GPT-5.3 Codex is positioned as the companyโs best model for programming tasks. According to OpenAI, the model shows significant improvements in:
- Code generation across all popular languages
- Debugging and refactoring existing codebases
- Code explanation โ the model better โseesโ project architecture
- Test generation โ unit test generation has become noticeably more accurate
Benchmarks
Independent test results:
| Test | GPT-5.3 Codex | Claude Opus 4.6 | Claude Sonnet 4.6 |
|---|---|---|---|
| SWE-Bench Verified | 78.4% | 76.1% | 82.1% |
| HumanEval+ | 95.8% | 94.3% | 96.2% |
| MBPP+ | 88.2% | 87.1% | 89.7% |
| Codeforces Rating | 1847 | 1792 | 1801 |
GPT-5.3 Codex confidently outperforms Claude Opus 4.6 in coding tasks but still falls short of Claude Sonnet 4.6 on SWE-Bench.
Key Improvements
Expanded Code Context
GPT-5.3 Codex features 128K tokens of context optimized for code files. OpenAI claims the model can hold the structure of a project with several hundred files in โmemory.โ
Improved Function Calling
For developers using the API, function calling has become more reliable. The model generates JSON call schemas more accurately and less frequently โinventsโ nonexistent parameters.
Codex Agent Mode
OpenAI introduced the Codex Agent mode, in which the model can:
- Execute commands sequentially in a terminal
- Read and modify files
- Run tests and iterate on results
This is a direct response to Claude Code from Anthropic and similar agent products.
Pricing
GPT-5.3 Codex is available via API at the following prices:
- Input: $8 / 1M tokens
- Output: $24 / 1M tokens
- Cached input: $2 / 1M tokens
This places the model in the mid-price segment โ more expensive than DeepSeek but cheaper than Claude Opus.
What to Choose for Trading Bots?
For developers of algorithmic trading systems, the choice between GPT-5.3 and Claude depends on the task:
- For writing strategies from scratch โ Claude Sonnet 4.6 shows the best results
- For integration with existing APIs โ GPT-5.3 Codex wins due to accurate function calling
- For market data analysis โ both options work well, but GPT-5.3 is faster at streaming generation
Competition between models is only intensifying, and thatโs great news for end users.
Discussion
Join the discussion in our Telegram chat!