OpenAI Unveils GPT-5.3-Codex-Spark — An Ultra-Fast AI for Real-Time Coding

13 February, 2026

No Comments

OpenAI today announced a research preview of GPT-5.3-Codex-Spark, a highly optimized coding model designed to deliver near-instant responses for developers working in real time.

GPT-5.3-Codex-Spark is a smaller, latency-focused version of the company’s recently released GPT-5.3-Codex model. It is engineered to run on specialized hardware — notably the Cerebras Wafer Scale Engine 3 — enabling inference speeds exceeding 1,000 tokens per second. This allows developers to interact with the AI model with almost no perceptible lag, a shift that could redefine real-time coding experiences.

According to the official update on OpenAI’s website, Codex-Spark is optimized for scenarios “where latency matters as much as intelligence,” meaning developers can interrupt or redirect the model mid-generation and receive rapid feedback. The model’s working style is deliberately lightweight: it makes targeted edits and avoids performing broader operations like running tests unless explicitly instructed.

Benchmarks suggest that while the new model remains highly capable across real-world coding tasks, it executes them in a fraction of the time compared with the larger GPT-5.3-Codex. The speed-focused design aims to make AI suggestions and code corrections feel immediate — approaching the responsiveness of a live collaborator, rather than a background tool.

In public comments tied to the new release, OpenAI highlighted the expanding partnership with Cerebras as a core technical enabler.

What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning,

said Sean Lie, CTO and co-founder of Cerebras.

Industry analysts say Codex-Spark could shift how AI tools integrate into developer environments. By drastically reducing latency, the model could make tools feel more like live coding partners — where suggestions, structure, and code logic evolve in step with a programmer’s actions. Early reports also highlight performance gains on industry benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, indicating that the model balances speed with solid engineering capabilities.

Material by Yana Petrova