GPT-5.5 Cuts Prices 50% While Doubling Context to 1M Tokens

OpenAI's GPT-5.5 ships with 82.7% Terminal-Bench score, 1M usable context, $5/$30 API pricing, and a 'High' cyber capability tier for enterprises.

By Rajesh Beri·April 24, 2026·14 min read
Share:

THE DAILY BRIEF

GPT-5.5OpenAIfrontier modelslong contextagentic codingTerminal-BenchFrontierMathCodexAPI pricingAI cybersecurityTrusted Access for Cyberenterprise AI

GPT-5.5 Cuts Prices 50% While Doubling Context to 1M Tokens

OpenAI's GPT-5.5 ships with 82.7% Terminal-Bench score, 1M usable context, $5/$30 API pricing, and a 'High' cyber capability tier for enterprises.

By Rajesh Beri·April 24, 2026·14 min read

On April 23, OpenAI shipped GPT-5.5 to ChatGPT and Codex. On April 24, it shipped to the API. The model that the prediction markets had been pricing in for two months — the first frontier model trained end-to-end on Stargate compute — landed not as GPT-6, not at a premium price, and not as a routine point release. It landed at $5 per million input tokens and $30 per million output tokens, with a one-million-token context window, an 82.7% score on Terminal-Bench 2.0, and a "High" classification in OpenAI's own cybersecurity preparedness framework.

For enterprise AI leaders, this is the second shoe dropping in a 72-hour window. On April 22, OpenAI retired Custom GPTs in favor of Workspace Agents. On April 23, Google merged Vertex AI into the Gemini Enterprise Agent Platform. On April 24, OpenAI doubled the per-token rate of its previous flagship while simultaneously delivering the kind of step-function agentic capability that resets every coding-assistant procurement decision in the Fortune 500.

This is the story of what GPT-5.5 actually is, where the benchmark numbers matter and where they don't, what the pricing change signals about OpenAI's enterprise strategy, and what AI engineering teams need to do this week.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What OpenAI Actually Shipped

GPT-5.5 is the first fully retrained base model since GPT-4.5 — not a fine-tuned variant of GPT-5, but a new pre-training run on the Stargate cluster in Abilene, Texas. The model ships in two tiers: GPT-5.5 (general purpose, $5/$30 per million tokens for input/output) and GPT-5.5 Pro ($30/$180 per million tokens, aimed at the hardest reasoning workloads). Both are available in the API as of April 24 with a one-million-token context window. Batch and Flex pricing are offered at half the standard rate; Priority processing runs at 2.5x.

The product framing is "agentic by default." OpenAI describes the model as designed to write and debug code, research online, analyze data, create documents and spreadsheets, operate software, and move across tools until a task is finished. When paired with Codex's computer-use skills, the model can see what is on screen, click, type, and navigate interfaces with measurably better precision than its predecessor. This is the same direction Anthropic took with Claude's computer use and Google took with Gemini's agent tooling, but GPT-5.5 ships with the deepest integration into a hosted runtime — Codex for engineers, Workspace Agents for knowledge workers — that any frontier vendor currently has in market.

The rollout sequence is worth noting. GPT-5.5 went to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex first. The API followed one day later. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in parallel. There is no "free" tier yet — OpenAI is metering the launch through paid surfaces while the API capacity scales.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Benchmark Numbers That Matter

The benchmark story is where this launch separates itself from the noise of routine model updates. Three numbers stand out.

Terminal-Bench 2.0: 82.7%. This is the agentic-coding benchmark that tests planning, iteration, and tool coordination across long command-line workflows. GPT-5.5 scores 82.7%, a 7.6-point jump over GPT-5.4 (75.1%). Anthropic's Claude Opus 4.7 scores 69.4%. Google's Gemini 3.1 Pro scores 68.5%. This is the largest gap GPT-5.5 opens against any frontier competitor, and it is precisely the benchmark that maps to the work enterprise AI engineering teams actually do — wiring up agents that have to plan a multi-step task, call a tool, react to its output, and continue.

FrontierMath Tier 4: 35.4%. On the hardest tier of FrontierMath — the benchmark that asks for novel research-grade mathematical reasoning — GPT-5.5 scores 35.4%. Claude Opus 4.7 scores 22.9%. Gemini 3.1 Pro scores 16.7%. This matters less for typical enterprise workflows and more for the leading-indicator question: when models start solving problems mathematicians struggle with, downstream technical work — proof verification, formal methods, financial modeling, scientific simulation — moves into the addressable surface for AI agents.

MRCR v2 long context: 74.0%. This is the number that quietly resets enterprise architecture. On the multi-round coreference resolution benchmark at context lengths of 512K to 1M tokens, GPT-5.5 scores 74.0%. GPT-5.4 scored 36.6%. On Graphwalks BFS at one million tokens, GPT-5.5 scores 45.4% versus GPT-5.4's 9.4%. The headline-grabbing one-million-token context was largely theoretical in earlier models because comprehension collapsed past 200K tokens. GPT-5.5 is the first OpenAI model where the context window and the comprehension window are roughly the same size.

The benchmark story is not uniformly favorable. Claude Opus 4.7 still beats GPT-5.5 on SWE-Bench Pro, the real-world GitHub issue resolution benchmark, scoring 64.3% against GPT-5.5's 58.6%. Both Claude and Gemini score higher on certain tool-use benchmarks. The honest summary is that GPT-5.5 has decisively pulled ahead in agentic terminal workflows, advanced reasoning, and long-context comprehension, while Claude retains its edge on real-world software engineering tasks and Gemini retains its edge on certain integration patterns. This is not a model that obsoletes the competition. It is a model that forces every multi-vendor enterprise to re-run its routing strategy.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Pricing Move No One Predicted

The pricing surprise cuts in two directions, and both matter for enterprise budgets.

GPT-5.5 lands at $5 per million input tokens and $30 per million output tokens. This is double the per-token rate of GPT-5.4, which sat at $2.50 input and $15 output. By that comparison, the model is more expensive. But against GPT-5 — the previous flagship the new model functionally replaces in enterprise agentic workloads — the rate is actually three times cheaper on input ($5 vs. $15) and half-price on output ($30 vs. $60). Whether you read this as "OpenAI doubled prices" or "OpenAI cut prices in half" depends entirely on which prior model you were running.

The structural read is that OpenAI is repricing the curve. GPT-5.4 was the cheap workhorse for high-volume workflows. GPT-5 was the expensive flagship for hard problems. GPT-5.5 collapses both into a single tier priced in the middle, with the GPT-5.5 Pro variant ($30/$180) absorbing the truly hardest reasoning loads. For enterprise FinOps, this means existing token budgets need to be rebuilt from scratch. Workflows currently routed to GPT-5.4 because of cost will see their per-call costs double if they migrate; workflows routed to GPT-5 for capability will see their costs drop by half to two-thirds for the same migration.

The Pro pricing of $30/$180 is the more aggressive number. At six times the standard rate, it is a deliberate signal: OpenAI thinks there is a class of enterprise workloads where customers will accept a 6x cost premium for the marginal accuracy on the hardest problems. The bet is that quant funds, biotech research teams, and elite engineering organizations will do the math and conclude that a $180-per-million-output-tokens model that one-shots a problem is cheaper than a $30 model that requires three iterations. Whether that holds is the open question of Q2.

Against the competitive set, $5/$30 makes GPT-5.5 cost-competitive with Claude Opus 4.7 ($15/$75 list, with negotiated enterprise rates lower) and a premium over Gemini 3.1 Pro ($1.25/$10 list). This is the clearest signal that OpenAI is pricing for share against Anthropic in the enterprise reasoning segment, while accepting that Google will continue to win price-sensitive workloads on Gemini.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The 1M Context Window Is the Architecture Reset

For enterprise AI engineering teams, the single most consequential change is the one-million-token effective context window.

Most production AI systems built in 2024 and 2025 use retrieval-augmented generation (RAG) precisely because frontier models could not reliably reason across long documents. The standard pattern is: chunk the corpus, embed it, retrieve top-k matches at query time, stuff into a 128K context window, generate. RAG was the workaround for context limits, and the entire vector-database industry — Pinecone, Weaviate, Chroma, MongoDB Atlas Vector Search, the pgvector extension that ships in every Postgres deployment — was built on the assumption that long-context comprehension would remain expensive and unreliable.

GPT-5.5 does not eliminate RAG. It does eliminate the architectural premise that RAG is the only viable pattern for long documents. A one-million-token context window with 74% MRCR comprehension can hold roughly 750,000 words — the entirety of The Lord of the Rings trilogy with room left for The Hobbit. For enterprise document workflows — contract review, policy analysis, due-diligence packets, audit walkthroughs, regulatory filings — the question shifts from "how do we chunk and retrieve" to "do we just put the whole thing in context."

The economics matter. At $5 per million input tokens, processing a 500K-token contract costs $2.50. The same workflow with a RAG pipeline costs nearly nothing per query but requires upfront investment in embedding generation, vector storage, retrieval tuning, and the engineering staff to maintain it. For workflows that touch a document once and need maximum fidelity — M&A due diligence, regulatory submissions, litigation discovery — the long-context route is now cheaper than RAG when you amortize the engineering cost. For high-frequency workflows over a stable corpus, RAG remains the right answer.

The other architecture reset is for agent loops. An agent that maintains 200K tokens of conversation history, tool outputs, and intermediate reasoning will hit context exhaustion in a long-running session under GPT-5. Under GPT-5.5, the same agent can run for an order of magnitude longer before hitting context limits. The Workspace Agents product OpenAI shipped on April 22 is now sitting on a model that can support the multi-hour autonomous workflows the launch materials promised.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


Cybersecurity Classification and Trusted Access for Cyber

The piece of the launch that will produce the most CISO email traffic is the cybersecurity classification.

OpenAI has classified GPT-5.5 as "High" capability in the cybersecurity domain — below "Critical" but above prior models. The rationale is straightforward: GPT-5.5's improvements in agentic coding, long-context comprehension, and tool use also improve its capability for offensive security work. Vulnerability research that was previously gated by a model's inability to hold a large codebase in context, plan a multi-step exploit, and execute against tooling is now more automatable. UK AISI's evaluation concluded the model's autonomous cyberattack capability "may indicate risk against at least small-scale enterprise networks with weak security posture" — networks without active defenses, monitoring, or fast response.

OpenAI's response to its own classification is Trusted Access for Cyber (TAC), an identity-gated access pathway for higher-risk dual-use cyber capabilities. The framing is that legitimate defenders, enterprise security teams, and verified researchers can access the unfettered capability surface, while general API access ships with safeguards calibrated to reduce misuse. The tighter controls, restrictions on sensitive cybersecurity requests, and protections against repeated misuse attempts that OpenAI debuted with GPT-5.2 are now expanded.

For enterprise security teams, this creates a procurement question with no precedent. Internal red teams will benefit materially from the new capability — automated reconnaissance, vulnerability assessment, and exploit development that previously required senior offensive security engineers can now be partially automated. But the same capability is available to adversaries who get access. The defensive playbook for the next 90 days needs to assume that a competent attacker has GPT-5.5-class capability and is using it against your perimeter.

The practical implications for AI engineering teams are narrower but immediate. Any agent your team builds that interacts with internal infrastructure — that has shell access, deployment privileges, or production database credentials — needs to be evaluated against the new threat model. A prompt injection that tricks a GPT-5.5-powered agent into exfiltrating a codebase or executing arbitrary code is a more dangerous outcome than the same injection on GPT-5.4 because the model is more capable of completing the resulting attack chain.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What to Do This Week

For AI engineering leaders, four moves are time-sensitive.

Re-route agentic coding workloads. Any internal workflow that uses Codex, GitHub Copilot, Cursor, or Claude Code for multi-step agentic work should be re-benchmarked against GPT-5.5 this week. The Terminal-Bench gap is large enough that production workflows will see measurable accuracy improvements for the same prompt scaffolding. The migration is one model-name change in most agent frameworks; the upside is an immediate quality lift on the workloads where you currently lose the most engineering hours to agent failure.

Audit your RAG investments. For each production RAG pipeline, ask: is the document corpus stable enough that retrieval is genuinely cheaper than long-context inference at $5 per million input tokens? For workflows that touch documents once or where retrieval tuning has been a persistent maintenance burden, GPT-5.5's long-context economics may now favor a "stuff the whole document in context" architecture. Don't migrate the working pipelines — but stop building new RAG infrastructure for use cases the new context window now handles natively.

Recalculate token budgets. Existing enterprise contracts with OpenAI assume a pricing curve that no longer exists. If your team budgets by model tier, those budgets are now stale. Pull the last 30 days of token usage by model, project the cost under the new pricing, and identify the workflows where the change is most consequential. The high-volume GPT-5.4 workflows are the ones most likely to surprise you on the next bill.

Update your threat model. Internal security teams need to know that GPT-5.5 is in the wild. Both your defenders and your adversaries have access to a more capable agentic coding model than they did 72 hours ago. Phishing pretext quality, vulnerability scanning sophistication, and post-compromise lateral movement are all improving on the offensive side; your defenders need the same uplift. If your SOC is not already running GPT-5.5-tier models in its detection and response loops, the gap is widening.

The broader pattern across the last two weeks — Workspace Agents, Gemini Enterprise Agent Platform, GPT-5.5 — is that the major frontier vendors are converging on a unified enterprise agent stack at exactly the moment they are also opening up a step-function in raw model capability. The CIOs and AI engineering leaders who treat this as a routine refresh cycle will find themselves twelve months behind by Q3. The ones who treat it as the architecture reset it actually is will spend the next quarter rebuilding budgets, reroute decisions, and security postures around capability they did not have last week.

GPT-5.5 is not the end state. The same model family is on a six-week release cadence; GPT-5.6 will probably ship before July. The right posture is not to chase every release — it is to build the internal evaluation muscle that can absorb a step-function model upgrade in a week, not a quarter. That muscle is the actual moat in enterprise AI in 2026, and the teams building it now are the ones who will spend the back half of the year executing while their competitors are still in procurement.


Rajesh Beri is Head of AI Engineering at Zscaler. He writes about enterprise AI strategy, security, and the gap between what vendors ship and what the Fortune 500 can absorb.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

GPT-5.5 Cuts Prices 50% While Doubling Context to 1M Tokens

Photo by Tara Winstead on Pexels

On April 23, OpenAI shipped GPT-5.5 to ChatGPT and Codex. On April 24, it shipped to the API. The model that the prediction markets had been pricing in for two months — the first frontier model trained end-to-end on Stargate compute — landed not as GPT-6, not at a premium price, and not as a routine point release. It landed at $5 per million input tokens and $30 per million output tokens, with a one-million-token context window, an 82.7% score on Terminal-Bench 2.0, and a "High" classification in OpenAI's own cybersecurity preparedness framework.

For enterprise AI leaders, this is the second shoe dropping in a 72-hour window. On April 22, OpenAI retired Custom GPTs in favor of Workspace Agents. On April 23, Google merged Vertex AI into the Gemini Enterprise Agent Platform. On April 24, OpenAI doubled the per-token rate of its previous flagship while simultaneously delivering the kind of step-function agentic capability that resets every coding-assistant procurement decision in the Fortune 500.

This is the story of what GPT-5.5 actually is, where the benchmark numbers matter and where they don't, what the pricing change signals about OpenAI's enterprise strategy, and what AI engineering teams need to do this week.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What OpenAI Actually Shipped

GPT-5.5 is the first fully retrained base model since GPT-4.5 — not a fine-tuned variant of GPT-5, but a new pre-training run on the Stargate cluster in Abilene, Texas. The model ships in two tiers: GPT-5.5 (general purpose, $5/$30 per million tokens for input/output) and GPT-5.5 Pro ($30/$180 per million tokens, aimed at the hardest reasoning workloads). Both are available in the API as of April 24 with a one-million-token context window. Batch and Flex pricing are offered at half the standard rate; Priority processing runs at 2.5x.

The product framing is "agentic by default." OpenAI describes the model as designed to write and debug code, research online, analyze data, create documents and spreadsheets, operate software, and move across tools until a task is finished. When paired with Codex's computer-use skills, the model can see what is on screen, click, type, and navigate interfaces with measurably better precision than its predecessor. This is the same direction Anthropic took with Claude's computer use and Google took with Gemini's agent tooling, but GPT-5.5 ships with the deepest integration into a hosted runtime — Codex for engineers, Workspace Agents for knowledge workers — that any frontier vendor currently has in market.

The rollout sequence is worth noting. GPT-5.5 went to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex first. The API followed one day later. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in parallel. There is no "free" tier yet — OpenAI is metering the launch through paid surfaces while the API capacity scales.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Benchmark Numbers That Matter

The benchmark story is where this launch separates itself from the noise of routine model updates. Three numbers stand out.

Terminal-Bench 2.0: 82.7%. This is the agentic-coding benchmark that tests planning, iteration, and tool coordination across long command-line workflows. GPT-5.5 scores 82.7%, a 7.6-point jump over GPT-5.4 (75.1%). Anthropic's Claude Opus 4.7 scores 69.4%. Google's Gemini 3.1 Pro scores 68.5%. This is the largest gap GPT-5.5 opens against any frontier competitor, and it is precisely the benchmark that maps to the work enterprise AI engineering teams actually do — wiring up agents that have to plan a multi-step task, call a tool, react to its output, and continue.

FrontierMath Tier 4: 35.4%. On the hardest tier of FrontierMath — the benchmark that asks for novel research-grade mathematical reasoning — GPT-5.5 scores 35.4%. Claude Opus 4.7 scores 22.9%. Gemini 3.1 Pro scores 16.7%. This matters less for typical enterprise workflows and more for the leading-indicator question: when models start solving problems mathematicians struggle with, downstream technical work — proof verification, formal methods, financial modeling, scientific simulation — moves into the addressable surface for AI agents.

MRCR v2 long context: 74.0%. This is the number that quietly resets enterprise architecture. On the multi-round coreference resolution benchmark at context lengths of 512K to 1M tokens, GPT-5.5 scores 74.0%. GPT-5.4 scored 36.6%. On Graphwalks BFS at one million tokens, GPT-5.5 scores 45.4% versus GPT-5.4's 9.4%. The headline-grabbing one-million-token context was largely theoretical in earlier models because comprehension collapsed past 200K tokens. GPT-5.5 is the first OpenAI model where the context window and the comprehension window are roughly the same size.

The benchmark story is not uniformly favorable. Claude Opus 4.7 still beats GPT-5.5 on SWE-Bench Pro, the real-world GitHub issue resolution benchmark, scoring 64.3% against GPT-5.5's 58.6%. Both Claude and Gemini score higher on certain tool-use benchmarks. The honest summary is that GPT-5.5 has decisively pulled ahead in agentic terminal workflows, advanced reasoning, and long-context comprehension, while Claude retains its edge on real-world software engineering tasks and Gemini retains its edge on certain integration patterns. This is not a model that obsoletes the competition. It is a model that forces every multi-vendor enterprise to re-run its routing strategy.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Pricing Move No One Predicted

The pricing surprise cuts in two directions, and both matter for enterprise budgets.

GPT-5.5 lands at $5 per million input tokens and $30 per million output tokens. This is double the per-token rate of GPT-5.4, which sat at $2.50 input and $15 output. By that comparison, the model is more expensive. But against GPT-5 — the previous flagship the new model functionally replaces in enterprise agentic workloads — the rate is actually three times cheaper on input ($5 vs. $15) and half-price on output ($30 vs. $60). Whether you read this as "OpenAI doubled prices" or "OpenAI cut prices in half" depends entirely on which prior model you were running.

The structural read is that OpenAI is repricing the curve. GPT-5.4 was the cheap workhorse for high-volume workflows. GPT-5 was the expensive flagship for hard problems. GPT-5.5 collapses both into a single tier priced in the middle, with the GPT-5.5 Pro variant ($30/$180) absorbing the truly hardest reasoning loads. For enterprise FinOps, this means existing token budgets need to be rebuilt from scratch. Workflows currently routed to GPT-5.4 because of cost will see their per-call costs double if they migrate; workflows routed to GPT-5 for capability will see their costs drop by half to two-thirds for the same migration.

The Pro pricing of $30/$180 is the more aggressive number. At six times the standard rate, it is a deliberate signal: OpenAI thinks there is a class of enterprise workloads where customers will accept a 6x cost premium for the marginal accuracy on the hardest problems. The bet is that quant funds, biotech research teams, and elite engineering organizations will do the math and conclude that a $180-per-million-output-tokens model that one-shots a problem is cheaper than a $30 model that requires three iterations. Whether that holds is the open question of Q2.

Against the competitive set, $5/$30 makes GPT-5.5 cost-competitive with Claude Opus 4.7 ($15/$75 list, with negotiated enterprise rates lower) and a premium over Gemini 3.1 Pro ($1.25/$10 list). This is the clearest signal that OpenAI is pricing for share against Anthropic in the enterprise reasoning segment, while accepting that Google will continue to win price-sensitive workloads on Gemini.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The 1M Context Window Is the Architecture Reset

For enterprise AI engineering teams, the single most consequential change is the one-million-token effective context window.

Most production AI systems built in 2024 and 2025 use retrieval-augmented generation (RAG) precisely because frontier models could not reliably reason across long documents. The standard pattern is: chunk the corpus, embed it, retrieve top-k matches at query time, stuff into a 128K context window, generate. RAG was the workaround for context limits, and the entire vector-database industry — Pinecone, Weaviate, Chroma, MongoDB Atlas Vector Search, the pgvector extension that ships in every Postgres deployment — was built on the assumption that long-context comprehension would remain expensive and unreliable.

GPT-5.5 does not eliminate RAG. It does eliminate the architectural premise that RAG is the only viable pattern for long documents. A one-million-token context window with 74% MRCR comprehension can hold roughly 750,000 words — the entirety of The Lord of the Rings trilogy with room left for The Hobbit. For enterprise document workflows — contract review, policy analysis, due-diligence packets, audit walkthroughs, regulatory filings — the question shifts from "how do we chunk and retrieve" to "do we just put the whole thing in context."

The economics matter. At $5 per million input tokens, processing a 500K-token contract costs $2.50. The same workflow with a RAG pipeline costs nearly nothing per query but requires upfront investment in embedding generation, vector storage, retrieval tuning, and the engineering staff to maintain it. For workflows that touch a document once and need maximum fidelity — M&A due diligence, regulatory submissions, litigation discovery — the long-context route is now cheaper than RAG when you amortize the engineering cost. For high-frequency workflows over a stable corpus, RAG remains the right answer.

The other architecture reset is for agent loops. An agent that maintains 200K tokens of conversation history, tool outputs, and intermediate reasoning will hit context exhaustion in a long-running session under GPT-5. Under GPT-5.5, the same agent can run for an order of magnitude longer before hitting context limits. The Workspace Agents product OpenAI shipped on April 22 is now sitting on a model that can support the multi-hour autonomous workflows the launch materials promised.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


Cybersecurity Classification and Trusted Access for Cyber

The piece of the launch that will produce the most CISO email traffic is the cybersecurity classification.

OpenAI has classified GPT-5.5 as "High" capability in the cybersecurity domain — below "Critical" but above prior models. The rationale is straightforward: GPT-5.5's improvements in agentic coding, long-context comprehension, and tool use also improve its capability for offensive security work. Vulnerability research that was previously gated by a model's inability to hold a large codebase in context, plan a multi-step exploit, and execute against tooling is now more automatable. UK AISI's evaluation concluded the model's autonomous cyberattack capability "may indicate risk against at least small-scale enterprise networks with weak security posture" — networks without active defenses, monitoring, or fast response.

OpenAI's response to its own classification is Trusted Access for Cyber (TAC), an identity-gated access pathway for higher-risk dual-use cyber capabilities. The framing is that legitimate defenders, enterprise security teams, and verified researchers can access the unfettered capability surface, while general API access ships with safeguards calibrated to reduce misuse. The tighter controls, restrictions on sensitive cybersecurity requests, and protections against repeated misuse attempts that OpenAI debuted with GPT-5.2 are now expanded.

For enterprise security teams, this creates a procurement question with no precedent. Internal red teams will benefit materially from the new capability — automated reconnaissance, vulnerability assessment, and exploit development that previously required senior offensive security engineers can now be partially automated. But the same capability is available to adversaries who get access. The defensive playbook for the next 90 days needs to assume that a competent attacker has GPT-5.5-class capability and is using it against your perimeter.

The practical implications for AI engineering teams are narrower but immediate. Any agent your team builds that interacts with internal infrastructure — that has shell access, deployment privileges, or production database credentials — needs to be evaluated against the new threat model. A prompt injection that tricks a GPT-5.5-powered agent into exfiltrating a codebase or executing arbitrary code is a more dangerous outcome than the same injection on GPT-5.4 because the model is more capable of completing the resulting attack chain.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What to Do This Week

For AI engineering leaders, four moves are time-sensitive.

Re-route agentic coding workloads. Any internal workflow that uses Codex, GitHub Copilot, Cursor, or Claude Code for multi-step agentic work should be re-benchmarked against GPT-5.5 this week. The Terminal-Bench gap is large enough that production workflows will see measurable accuracy improvements for the same prompt scaffolding. The migration is one model-name change in most agent frameworks; the upside is an immediate quality lift on the workloads where you currently lose the most engineering hours to agent failure.

Audit your RAG investments. For each production RAG pipeline, ask: is the document corpus stable enough that retrieval is genuinely cheaper than long-context inference at $5 per million input tokens? For workflows that touch documents once or where retrieval tuning has been a persistent maintenance burden, GPT-5.5's long-context economics may now favor a "stuff the whole document in context" architecture. Don't migrate the working pipelines — but stop building new RAG infrastructure for use cases the new context window now handles natively.

Recalculate token budgets. Existing enterprise contracts with OpenAI assume a pricing curve that no longer exists. If your team budgets by model tier, those budgets are now stale. Pull the last 30 days of token usage by model, project the cost under the new pricing, and identify the workflows where the change is most consequential. The high-volume GPT-5.4 workflows are the ones most likely to surprise you on the next bill.

Update your threat model. Internal security teams need to know that GPT-5.5 is in the wild. Both your defenders and your adversaries have access to a more capable agentic coding model than they did 72 hours ago. Phishing pretext quality, vulnerability scanning sophistication, and post-compromise lateral movement are all improving on the offensive side; your defenders need the same uplift. If your SOC is not already running GPT-5.5-tier models in its detection and response loops, the gap is widening.

The broader pattern across the last two weeks — Workspace Agents, Gemini Enterprise Agent Platform, GPT-5.5 — is that the major frontier vendors are converging on a unified enterprise agent stack at exactly the moment they are also opening up a step-function in raw model capability. The CIOs and AI engineering leaders who treat this as a routine refresh cycle will find themselves twelve months behind by Q3. The ones who treat it as the architecture reset it actually is will spend the next quarter rebuilding budgets, reroute decisions, and security postures around capability they did not have last week.

GPT-5.5 is not the end state. The same model family is on a six-week release cadence; GPT-5.6 will probably ship before July. The right posture is not to chase every release — it is to build the internal evaluation muscle that can absorb a step-function model upgrade in a week, not a quarter. That muscle is the actual moat in enterprise AI in 2026, and the teams building it now are the ones who will spend the back half of the year executing while their competitors are still in procurement.


Rajesh Beri is Head of AI Engineering at Zscaler. He writes about enterprise AI strategy, security, and the gap between what vendors ship and what the Fortune 500 can absorb.


Continue Reading

Share:

THE DAILY BRIEF

GPT-5.5OpenAIfrontier modelslong contextagentic codingTerminal-BenchFrontierMathCodexAPI pricingAI cybersecurityTrusted Access for Cyberenterprise AI

GPT-5.5 Cuts Prices 50% While Doubling Context to 1M Tokens

OpenAI's GPT-5.5 ships with 82.7% Terminal-Bench score, 1M usable context, $5/$30 API pricing, and a 'High' cyber capability tier for enterprises.

By Rajesh Beri·April 24, 2026·14 min read

On April 23, OpenAI shipped GPT-5.5 to ChatGPT and Codex. On April 24, it shipped to the API. The model that the prediction markets had been pricing in for two months — the first frontier model trained end-to-end on Stargate compute — landed not as GPT-6, not at a premium price, and not as a routine point release. It landed at $5 per million input tokens and $30 per million output tokens, with a one-million-token context window, an 82.7% score on Terminal-Bench 2.0, and a "High" classification in OpenAI's own cybersecurity preparedness framework.

For enterprise AI leaders, this is the second shoe dropping in a 72-hour window. On April 22, OpenAI retired Custom GPTs in favor of Workspace Agents. On April 23, Google merged Vertex AI into the Gemini Enterprise Agent Platform. On April 24, OpenAI doubled the per-token rate of its previous flagship while simultaneously delivering the kind of step-function agentic capability that resets every coding-assistant procurement decision in the Fortune 500.

This is the story of what GPT-5.5 actually is, where the benchmark numbers matter and where they don't, what the pricing change signals about OpenAI's enterprise strategy, and what AI engineering teams need to do this week.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What OpenAI Actually Shipped

GPT-5.5 is the first fully retrained base model since GPT-4.5 — not a fine-tuned variant of GPT-5, but a new pre-training run on the Stargate cluster in Abilene, Texas. The model ships in two tiers: GPT-5.5 (general purpose, $5/$30 per million tokens for input/output) and GPT-5.5 Pro ($30/$180 per million tokens, aimed at the hardest reasoning workloads). Both are available in the API as of April 24 with a one-million-token context window. Batch and Flex pricing are offered at half the standard rate; Priority processing runs at 2.5x.

The product framing is "agentic by default." OpenAI describes the model as designed to write and debug code, research online, analyze data, create documents and spreadsheets, operate software, and move across tools until a task is finished. When paired with Codex's computer-use skills, the model can see what is on screen, click, type, and navigate interfaces with measurably better precision than its predecessor. This is the same direction Anthropic took with Claude's computer use and Google took with Gemini's agent tooling, but GPT-5.5 ships with the deepest integration into a hosted runtime — Codex for engineers, Workspace Agents for knowledge workers — that any frontier vendor currently has in market.

The rollout sequence is worth noting. GPT-5.5 went to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex first. The API followed one day later. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in parallel. There is no "free" tier yet — OpenAI is metering the launch through paid surfaces while the API capacity scales.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Benchmark Numbers That Matter

The benchmark story is where this launch separates itself from the noise of routine model updates. Three numbers stand out.

Terminal-Bench 2.0: 82.7%. This is the agentic-coding benchmark that tests planning, iteration, and tool coordination across long command-line workflows. GPT-5.5 scores 82.7%, a 7.6-point jump over GPT-5.4 (75.1%). Anthropic's Claude Opus 4.7 scores 69.4%. Google's Gemini 3.1 Pro scores 68.5%. This is the largest gap GPT-5.5 opens against any frontier competitor, and it is precisely the benchmark that maps to the work enterprise AI engineering teams actually do — wiring up agents that have to plan a multi-step task, call a tool, react to its output, and continue.

FrontierMath Tier 4: 35.4%. On the hardest tier of FrontierMath — the benchmark that asks for novel research-grade mathematical reasoning — GPT-5.5 scores 35.4%. Claude Opus 4.7 scores 22.9%. Gemini 3.1 Pro scores 16.7%. This matters less for typical enterprise workflows and more for the leading-indicator question: when models start solving problems mathematicians struggle with, downstream technical work — proof verification, formal methods, financial modeling, scientific simulation — moves into the addressable surface for AI agents.

MRCR v2 long context: 74.0%. This is the number that quietly resets enterprise architecture. On the multi-round coreference resolution benchmark at context lengths of 512K to 1M tokens, GPT-5.5 scores 74.0%. GPT-5.4 scored 36.6%. On Graphwalks BFS at one million tokens, GPT-5.5 scores 45.4% versus GPT-5.4's 9.4%. The headline-grabbing one-million-token context was largely theoretical in earlier models because comprehension collapsed past 200K tokens. GPT-5.5 is the first OpenAI model where the context window and the comprehension window are roughly the same size.

The benchmark story is not uniformly favorable. Claude Opus 4.7 still beats GPT-5.5 on SWE-Bench Pro, the real-world GitHub issue resolution benchmark, scoring 64.3% against GPT-5.5's 58.6%. Both Claude and Gemini score higher on certain tool-use benchmarks. The honest summary is that GPT-5.5 has decisively pulled ahead in agentic terminal workflows, advanced reasoning, and long-context comprehension, while Claude retains its edge on real-world software engineering tasks and Gemini retains its edge on certain integration patterns. This is not a model that obsoletes the competition. It is a model that forces every multi-vendor enterprise to re-run its routing strategy.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The Pricing Move No One Predicted

The pricing surprise cuts in two directions, and both matter for enterprise budgets.

GPT-5.5 lands at $5 per million input tokens and $30 per million output tokens. This is double the per-token rate of GPT-5.4, which sat at $2.50 input and $15 output. By that comparison, the model is more expensive. But against GPT-5 — the previous flagship the new model functionally replaces in enterprise agentic workloads — the rate is actually three times cheaper on input ($5 vs. $15) and half-price on output ($30 vs. $60). Whether you read this as "OpenAI doubled prices" or "OpenAI cut prices in half" depends entirely on which prior model you were running.

The structural read is that OpenAI is repricing the curve. GPT-5.4 was the cheap workhorse for high-volume workflows. GPT-5 was the expensive flagship for hard problems. GPT-5.5 collapses both into a single tier priced in the middle, with the GPT-5.5 Pro variant ($30/$180) absorbing the truly hardest reasoning loads. For enterprise FinOps, this means existing token budgets need to be rebuilt from scratch. Workflows currently routed to GPT-5.4 because of cost will see their per-call costs double if they migrate; workflows routed to GPT-5 for capability will see their costs drop by half to two-thirds for the same migration.

The Pro pricing of $30/$180 is the more aggressive number. At six times the standard rate, it is a deliberate signal: OpenAI thinks there is a class of enterprise workloads where customers will accept a 6x cost premium for the marginal accuracy on the hardest problems. The bet is that quant funds, biotech research teams, and elite engineering organizations will do the math and conclude that a $180-per-million-output-tokens model that one-shots a problem is cheaper than a $30 model that requires three iterations. Whether that holds is the open question of Q2.

Against the competitive set, $5/$30 makes GPT-5.5 cost-competitive with Claude Opus 4.7 ($15/$75 list, with negotiated enterprise rates lower) and a premium over Gemini 3.1 Pro ($1.25/$10 list). This is the clearest signal that OpenAI is pricing for share against Anthropic in the enterprise reasoning segment, while accepting that Google will continue to win price-sensitive workloads on Gemini.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


The 1M Context Window Is the Architecture Reset

For enterprise AI engineering teams, the single most consequential change is the one-million-token effective context window.

Most production AI systems built in 2024 and 2025 use retrieval-augmented generation (RAG) precisely because frontier models could not reliably reason across long documents. The standard pattern is: chunk the corpus, embed it, retrieve top-k matches at query time, stuff into a 128K context window, generate. RAG was the workaround for context limits, and the entire vector-database industry — Pinecone, Weaviate, Chroma, MongoDB Atlas Vector Search, the pgvector extension that ships in every Postgres deployment — was built on the assumption that long-context comprehension would remain expensive and unreliable.

GPT-5.5 does not eliminate RAG. It does eliminate the architectural premise that RAG is the only viable pattern for long documents. A one-million-token context window with 74% MRCR comprehension can hold roughly 750,000 words — the entirety of The Lord of the Rings trilogy with room left for The Hobbit. For enterprise document workflows — contract review, policy analysis, due-diligence packets, audit walkthroughs, regulatory filings — the question shifts from "how do we chunk and retrieve" to "do we just put the whole thing in context."

The economics matter. At $5 per million input tokens, processing a 500K-token contract costs $2.50. The same workflow with a RAG pipeline costs nearly nothing per query but requires upfront investment in embedding generation, vector storage, retrieval tuning, and the engineering staff to maintain it. For workflows that touch a document once and need maximum fidelity — M&A due diligence, regulatory submissions, litigation discovery — the long-context route is now cheaper than RAG when you amortize the engineering cost. For high-frequency workflows over a stable corpus, RAG remains the right answer.

The other architecture reset is for agent loops. An agent that maintains 200K tokens of conversation history, tool outputs, and intermediate reasoning will hit context exhaustion in a long-running session under GPT-5. Under GPT-5.5, the same agent can run for an order of magnitude longer before hitting context limits. The Workspace Agents product OpenAI shipped on April 22 is now sitting on a model that can support the multi-hour autonomous workflows the launch materials promised.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


Cybersecurity Classification and Trusted Access for Cyber

The piece of the launch that will produce the most CISO email traffic is the cybersecurity classification.

OpenAI has classified GPT-5.5 as "High" capability in the cybersecurity domain — below "Critical" but above prior models. The rationale is straightforward: GPT-5.5's improvements in agentic coding, long-context comprehension, and tool use also improve its capability for offensive security work. Vulnerability research that was previously gated by a model's inability to hold a large codebase in context, plan a multi-step exploit, and execute against tooling is now more automatable. UK AISI's evaluation concluded the model's autonomous cyberattack capability "may indicate risk against at least small-scale enterprise networks with weak security posture" — networks without active defenses, monitoring, or fast response.

OpenAI's response to its own classification is Trusted Access for Cyber (TAC), an identity-gated access pathway for higher-risk dual-use cyber capabilities. The framing is that legitimate defenders, enterprise security teams, and verified researchers can access the unfettered capability surface, while general API access ships with safeguards calibrated to reduce misuse. The tighter controls, restrictions on sensitive cybersecurity requests, and protections against repeated misuse attempts that OpenAI debuted with GPT-5.2 are now expanded.

For enterprise security teams, this creates a procurement question with no precedent. Internal red teams will benefit materially from the new capability — automated reconnaissance, vulnerability assessment, and exploit development that previously required senior offensive security engineers can now be partially automated. But the same capability is available to adversaries who get access. The defensive playbook for the next 90 days needs to assume that a competent attacker has GPT-5.5-class capability and is using it against your perimeter.

The practical implications for AI engineering teams are narrower but immediate. Any agent your team builds that interacts with internal infrastructure — that has shell access, deployment privileges, or production database credentials — needs to be evaluated against the new threat model. A prompt injection that tricks a GPT-5.5-powered agent into exfiltrating a codebase or executing arbitrary code is a more dangerous outcome than the same injection on GPT-5.4 because the model is more capable of completing the resulting attack chain.

Calculate your potential AI savings: Try our AI ROI Calculator to see projected cost reductions and payback timelines for your organization.


What to Do This Week

For AI engineering leaders, four moves are time-sensitive.

Re-route agentic coding workloads. Any internal workflow that uses Codex, GitHub Copilot, Cursor, or Claude Code for multi-step agentic work should be re-benchmarked against GPT-5.5 this week. The Terminal-Bench gap is large enough that production workflows will see measurable accuracy improvements for the same prompt scaffolding. The migration is one model-name change in most agent frameworks; the upside is an immediate quality lift on the workloads where you currently lose the most engineering hours to agent failure.

Audit your RAG investments. For each production RAG pipeline, ask: is the document corpus stable enough that retrieval is genuinely cheaper than long-context inference at $5 per million input tokens? For workflows that touch documents once or where retrieval tuning has been a persistent maintenance burden, GPT-5.5's long-context economics may now favor a "stuff the whole document in context" architecture. Don't migrate the working pipelines — but stop building new RAG infrastructure for use cases the new context window now handles natively.

Recalculate token budgets. Existing enterprise contracts with OpenAI assume a pricing curve that no longer exists. If your team budgets by model tier, those budgets are now stale. Pull the last 30 days of token usage by model, project the cost under the new pricing, and identify the workflows where the change is most consequential. The high-volume GPT-5.4 workflows are the ones most likely to surprise you on the next bill.

Update your threat model. Internal security teams need to know that GPT-5.5 is in the wild. Both your defenders and your adversaries have access to a more capable agentic coding model than they did 72 hours ago. Phishing pretext quality, vulnerability scanning sophistication, and post-compromise lateral movement are all improving on the offensive side; your defenders need the same uplift. If your SOC is not already running GPT-5.5-tier models in its detection and response loops, the gap is widening.

The broader pattern across the last two weeks — Workspace Agents, Gemini Enterprise Agent Platform, GPT-5.5 — is that the major frontier vendors are converging on a unified enterprise agent stack at exactly the moment they are also opening up a step-function in raw model capability. The CIOs and AI engineering leaders who treat this as a routine refresh cycle will find themselves twelve months behind by Q3. The ones who treat it as the architecture reset it actually is will spend the next quarter rebuilding budgets, reroute decisions, and security postures around capability they did not have last week.

GPT-5.5 is not the end state. The same model family is on a six-week release cadence; GPT-5.6 will probably ship before July. The right posture is not to chase every release — it is to build the internal evaluation muscle that can absorb a step-function model upgrade in a week, not a quarter. That muscle is the actual moat in enterprise AI in 2026, and the teams building it now are the ones who will spend the back half of the year executing while their competitors are still in procurement.


Rajesh Beri is Head of AI Engineering at Zscaler. He writes about enterprise AI strategy, security, and the gap between what vendors ship and what the Fortune 500 can absorb.


Continue Reading

THE DAILY BRIEF

Enterprise AI insights for technology and business leaders, twice weekly.

thedailybrief.com

Subscribe at thedailybrief.com/subscribe for weekly AI insights delivered to your inbox.

LinkedIn: linkedin.com/in/rberi  |  X: x.com/rajeshberi

© 2026 Rajesh Beri. All rights reserved.

Newsletter

Stay Ahead of the Curve

Weekly enterprise AI insights for technology leaders. No spam, no vendor pitches—unsubscribe anytime.

Subscribe

Related Articles

Latest Articles

View All →