Futures
Hundreds of contracts settled in USDT or BTC
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to experience risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
GPT-5.4, "Agent Native" large model is coming?
OpenAI finally figured it out.
Just two days after the rumors, on March 5th, local time, OpenAI officially launched GPT-5.4. This model update focuses on the hottest AI Agent direction right now.
Before GPT-5.4, the capabilities of large models could be summarized in one sentence: they can tell you “how to do it,” but they can’t do it themselves.
If you ask it to analyze competitors, it will give you a lengthy report; if you ask it to organize an Excel sheet, it will write some Python code for you to run; if you ask it to book a flight, it will tell you step-by-step which website to go to and which buttons to click.
The wall in the middle is called “computer operation.”
GPT-5.4 is OpenAI’s first general model to break down this wall.
GPT-5.4 compared to previous models|Image source: OpenAI
It can recognize screen content through screenshots, send mouse and keyboard commands, and execute multi-step workflows across different applications. In OpenAI’s own words, this is their “most powerful and efficient frontier model for professional work to date.”
More technically, GPT-5.4 supports up to 1 million tokens in the context window and can call libraries like Playwright to directly control browsers and desktop applications.
This means it is no longer just “dialogue about tasks,” but “the tasks themselves.”
01 OpenAI’s groundwork
If you’ve been following OpenAI’s recent moves over the past few months, you’ll see that GPT-5.4 isn’t an abrupt new product but a clear step along a strategic path.
Just two weeks ago, OpenAI released GPT-5.3-Codex, upgrading Codex from a “code-writing agent” to an “agent capable of almost everything a developer does on a computer,” setting new industry benchmarks on SWE-Bench Pro and Terminal-Bench.
Meanwhile, OpenAI launched the enterprise-focused “Frontier” platform, with HP, Intuit, and Uber as early users.
GPT-5.4 clearly outperforms GPT-5.2 in spreadsheet filling|Image source: OpenAI
Earlier, on March 2nd, OpenAI and AWS expanded their existing $3.8 billion partnership to over $100 billion, lasting 8 years, with AWS becoming the exclusive third-party cloud provider for the OpenAI Frontier platform. The scale of this investment itself is a signal.
The latest $110 billion funding round, supported by Amazon, SoftBank, and Nvidia, also closed around the same time.
This isn’t a company just “developing good products”; it’s a company sprinting to “win the enterprise AI agent market.”
GPT-5.4’s native computer operation capabilities are the key weapon in this sprint.
02 Is it really useful?
Demo videos at launch events always look impressive, but the real test is actual performance.
Financial tech company Walleye Capital reported in internal testing that GPT-5.4 improved accuracy in Excel financial modeling assessments by 30 percentage points, significantly speeding up automated scenario analysis.
Talent assessment platform Mercor’s CEO called it “the best model we’ve tested,” showing outstanding performance in long-term tasks like slide creation, financial modeling, and legal analysis.
An independent developer who uses Codex daily gave a more down-to-earth review: “GPT-5.4 is my new daily driver for Codex. Its thinking is closer to humans, and it’s not as obsessed with technical details as 5.3.” But he also added a caution — “be careful, I’ve encountered several cases where the model misexecuted tasks and concealed it.”
GPT-5.4’s improvements in operation and visual capabilities|Image source: OpenAI
This detail is worth noting.
Benchmark data also confirms this capability boost. Reports indicate that GPT-5.4 outperforms 83% of average office workers on the GDPval benchmark. This number sounds impressive, but the real question isn’t “how many people it can surpass,” but “which tasks it can replace humans in.”
However, Dr. Jeff Dalton from the University of Edinburgh’s School of Informatics pointed out a practical issue — in current demos, there is hardly enough detailed evidence to support such grand claims. The capabilities are real, but the boundaries still need more independent validation.
03 The Agent battlefield has no safe zone
If GPT-5.4 represents OpenAI’s ambition for Agents, competitors are not idle.
Anthropic’s Claude 3.7 Sonnet launched the “Computer Use” feature as early as February this year, positioning it as a hybrid reasoning model designed for complex tasks.
Google’s Gemini 2.0 series continues to develop “Agentic” capabilities, with Project Mariner already able to perform multi-step operations autonomously within Chrome.
But the fundamental difference between GPT-5.4 and its competitors is that it is OpenAI’s first product to embed computer operation capabilities directly into a general model — not a separate tool, not an API that needs to be called, but a built-in feature of the model itself.
This “native” aspect, in engineering terms, means lower latency, more natural task transitions, and less “glue code.” For enterprises eager to deploy Agent applications quickly, this difference directly impacts deployment costs.
OpenAI also announced that GPT-5.4 can directly connect to Microsoft Excel and Google Sheets, performing granular analysis and automation at the cell level. This step clearly targets the core of enterprise decision-making processes.
In the Agent arena, it’s never about who runs faster, but who can embed themselves into enterprise workflows first, becoming an indispensable part.
Tech launches are always passionate, but the real test comes on day 91 — when the hype fades, and users start applying this tool in real work scenarios. Will it reliably handle screenshots, accurately click buttons, quietly complete tasks, and deliver results?
The developer’s comment about “concealed errors” is the most cautionary note I’ve seen in this report so far.
The ceiling of AI Agent capabilities has never been “what it can do,” but “whether you dare to trust it to do it.”
Trust is the real currency in this Agent war.