GPT-5.4, "Agent Native" large model is coming?

Question

Just two days after the rumors, on March 5th local time, OpenAI officially launched GPT-5.4. This model update focuses on the hottest AI Agent trend right now.

Before GPT-5.4, the capabilities of large models could be summarized in one sentence: they can tell you “how to do” something, but they can’t do it themselves.

If you ask it to analyze competitors, it will give you a lengthy written report; if you ask it to organize an Excel sheet, it will generate some Python code for you to run; if you ask it to book a flight, it will guide you step-by-step on which website to visit and which buttons to click.

That middle barrier is called “computer operation.”

GPT-5.4 is OpenAI’s first general model to break down that barrier.

Compared to previous models, GPT-5.4 has improved｜Image source: OpenAI

It can recognize screen content through screenshots, send mouse and keyboard commands, and execute multi-step workflows across different applications. In OpenAI’s own words, this is their “most powerful and efficient frontier model for professional work to date.”

More technically, GPT-5.4 supports a context window of up to 1 million tokens and can call libraries like Playwright to directly control browsers and desktop applications.

This means it’s no longer just about “conversing about tasks,” but about “performing the tasks themselves.”

01 OpenAI’s groundwork

If you’ve been following OpenAI’s recent moves over the past few months, you’ll see that GPT-5.4 isn’t a sudden product emergence but a strategic step along a clear trajectory.

Just two weeks ago, OpenAI released GPT-5.3-Codex, upgrading Codex from “a code-writing agent” to “an agent capable of almost everything a developer does on a computer,” setting new industry benchmarks on SWE-Bench Pro and Terminal-Bench.

Meanwhile, OpenAI launched the enterprise-focused “Frontier” platform, with early users like HP, Intuit, and Uber.

GPT-5.4 is obviously smarter at filling out spreadsheets than 5.2｜Image source: OpenAI

Earlier, on March 2nd, OpenAI and AWS expanded their existing $3.8 billion partnership to over $100 billion, spanning 8 years, with AWS becoming the exclusive third-party cloud provider for the OpenAI Frontier platform. The scale of this investment itself signals a major move.

The latest $110 billion funding round, supported by Amazon, SoftBank, and Nvidia with hundreds of millions of dollars each, was also announced around the same time.

This isn’t a company just “developing good products”; it’s a company sprinting to “win the enterprise AI agent market.”

The native computer operation capabilities of GPT-5.4 are the key weapon in this race.

02 Is it really useful?

Demo videos at launch events always look impressive, but the real test is in actual performance.

Financial tech firm Walleye Capital reported internally that GPT-5.4 improved accuracy by 30 percentage points in Excel financial modeling, significantly speeding up automated scenario analysis.

Talent assessment platform Mercor’s CEO called it “the best model we’ve tested,” highlighting its strong performance on long-cycle tasks like slide creation, financial modeling, and legal analysis.

An independent developer who uses Codex daily gave a more down-to-earth review: “GPT-5.4 is my new daily driver for Codex. Its thinking is closer to humans, and it’s not as obsessed with technical details as 5.3.” But he also issued a warning — “Be careful, I’ve encountered several cases where the model executed tasks incorrectly but concealed the errors.”

GPT-5.4’s improvements in operation and visual capabilities｜Image source: OpenAI

This detail is worth noting.

Benchmark data also confirms this capability boost. Reports indicate GPT-5.4 outperforms 83% of average office workers on the GDPval benchmark. That number sounds impressive, but the real question isn’t “how many people it can surpass,” but “which tasks it can replace humans in.”

However, Dr. Jeff Dalton from the University of Edinburgh’s School of Informatics pointed out a practical issue — current demonstrations lack enough detailed evaluation evidence to support such grand claims. The capabilities are real, but the boundaries still need more independent validation.

03 The Agent battlefield has no safe zone

If GPT-5.4 represents OpenAI’s ambition for Agents, competitors aren’t sitting still.

Anthropic’s Claude 3.7 Sonnet launched the “Computer Use” feature as early as February this year, positioning it as a hybrid reasoning model designed for complex tasks.

Google’s Gemini 2.0 series continues to develop “Agentic” capabilities, with Project Mariner already able to autonomously perform multi-step operations within Chrome.

But the fundamental difference between GPT-5.4 and its competitors is that it’s OpenAI’s first product to embed computer operation capabilities directly into a general-purpose model — not a separate tool, not an API call, but a built-in feature of the model itself.

This “native” aspect, in engineering terms, means lower latency, more natural task transitions, and less “glue code.” For enterprises eager to deploy Agent applications quickly, this difference directly impacts deployment costs.

OpenAI also announced that GPT-5.4 can directly connect to Microsoft Excel and Google Sheets, performing granular analysis and automation at the cell level. This step clearly targets the core of enterprise decision-making processes.

In the Agent arena, it’s never about who runs fastest, but who can embed themselves earliest into enterprise workflows, becoming an indispensable part.

Tech launches are always exciting, but the real test comes on Day 91 — when the hype fades, and users start applying these tools in real work scenarios. Can it reliably handle screenshots, accurately click buttons, quietly complete tasks, and deliver results?

The developer’s comment about “concealing errors” is the most cautionary note I’ve seen in this report so far.

The ceiling of AI Agent capabilities is never “what it can do,” but “whether you dare to trust it to do it.”

Trust is the real currency in this Agent war.

GPT-5.4, "Agent Native" large model is coming?

Trending Topics

GateLaunchesGateforAI

CryptoMarketsDipSlightly

GoldAndSilverMoveHigher

USIranTensionsImpactMarkets

AISectorRisesAgainstTheTrend

Hot Gate Fun

SHIT

SHIT

web4.0

web4.0

PI

PI

Ayan

ARGT

π

zaicheng

Pin