OpenAI is anxious: Anthropic is using AI programming to knock it off its pedestal

Text | Xiao Jing

Editor | Xu Qingyang

The release of ChatGPT has made OpenAI a legend overnight, and everyone thought this AI company would keep winning. However, in the AI programming race, it’s not OpenAI that has taken the lead.

In February 2025, competitor Anthropic quietly launched Claude Code. This AI agent, capable of directly controlling computers and autonomously completing programming tasks, generated over $2.5 billion in annual revenue for Anthropic within just a few months.

In comparison, OpenAI’s similar product Codex earned about $1 billion annually during the same period. The gap is more than double.

What’s more embarrassing for OpenAI is that Anthropic’s core founding team was actually formed by people who left OpenAI a few years earlier.

OpenAI’s new headquarters in Mission Bay, San Francisco, is a modern glass-walled building. The reception area displays promotional materials about the company’s development history, and the walls of the staircase are covered with posters commemorating milestones: GPT series, DALL·E, ChatGPT—each marking a highlight of the company’s recent years.

But none of these highlight AI programming.

01 From Codex to Copilot: Missed First-Mover Advantage for OpenAI

OpenAI actually started exploring AI programming quite early.

In 2021, Greg Brockman, co-founder of OpenAI, and Otman, then working on GPT-3, showed a project called Codex to Wired magazine at their old office in San Francisco’s Mission District. It was a branch of GPT-3 trained on billions of lines of open-source code from GitHub. Users could input a natural language description, and it would generate corresponding code.

“It can perform operations in the computer world on your behalf,” Brockman said at the time. “You have a command-executing system.”

But this early technical foundation ultimately didn’t translate into sustained product investment.

Microsoft saw the potential in Codex. At that time, the software giant was developing GitHub Copilot, a tool embedded in programmers’ editors that offers code completion. An early OpenAI employee recalled that Codex “couldn’t do much beyond auto-completion,” but Microsoft already regarded it as a key future product direction.

In June 2022, GitHub Copilot was officially launched, attracting hundreds of thousands of users within months.

Under normal circumstances, OpenAI should have increased investment in this area. But what happened next made the team responsible for Codex regret their decision.

The original Codex team was disbanded. Some members moved to work on DALL·E 2 image generation, others participated in training GPT-4. At that time, the company’s primary goal was achieving AGI, and AI programming wasn’t seen as a separate field requiring dedicated focus.

A former team member said that in the following years, OpenAI didn’t have a dedicated team developing AI programming products. “It felt like this area was already covered by GitHub Copilot,” since Microsoft would continue to use OpenAI’s models to iterate on that product, so OpenAI didn’t need to worry about it.

A few months later, ChatGPT launched, surpassing 100 million users in two months. OpenAI’s attention was fully diverted by this success.

In 2023 and 2024, OpenAI focused its main resources on developing multimodal models, aiming to enable AI to understand images, videos, and audio, and operate cursors and keyboards like humans. During that time, products like Midjourney were rising, and industry consensus believed that large language models needed multimodal capabilities to reach higher levels of intelligence.

This strategic choice was not inherently wrong. But during this period, the AI programming track was quietly growing, while OpenAI’s focus was elsewhere.

02 Focusing on the Programming Track: Anthropic’s Differentiated Breakthrough

Anthropic chose a different development path.

The company also worked on multimodal models and chatbots, but one area they never relaxed on was programming ability.

Brockman later mentioned in a podcast that “from early on, Anthropic was very focused on programming.” They trained models not only with algorithm competition problems but also incorporated real-world project code, including messy, unstructured code typical of everyday developers. “That’s something we didn’t realize was so important at the time,” he said.

In June 2024, Anthropic released Claude Sonnet 3.5. Many developers who tried it found its programming ability truly outstanding.

A startup called Cursor was among the first to benefit. A group of young people in their twenties created a product: in a code editor, users can describe what they want in natural language, and AI directly helps modify the code. After integrating Sonnet 3.5, user numbers started to grow rapidly. According to insiders familiar with Cursor, within a few months, Anthropic began internal testing of its own standalone version, later called Claude Code.

After Cursor gained popularity, OpenAI attempted to acquire the company but was rejected. The founders wanted to stay independent, believing the programming track had huge potential.

The acquisition fell through, and internally, OpenAI also began to explore AI programming. By the end of 2024, several small teams had started.

One team was led by Andrey Mishchenko and Thibault Sottiaux, who were respectively the research lead for Codex and a former Google DeepMind researcher. Their initial motivation was pragmatic: to accelerate AI research by using AI for managing training tasks and monitoring GPU clusters, freeing researchers for more creative work.

Another team was led by Alexander Embiricos, who previously worked on multimodal intelligent agents. He created a demo called Jam, which attracted considerable internal attention.

Jam was fundamentally different from Codex in 2021. Codex outputs code for humans to execute, while Jam could directly enter command line and run code itself. Embiricos watched the self-built interface tracking Jam’s operations automatically update on the screen, feeling a bit incredulous.

“I used to think multimodal interaction might be the path to AGI, maybe we’d just be sharing screens with AI all day,” he said. “But I gradually realized that letting models access computers directly through programming might be a more effective approach.”

After several months of team integration, they merged. When OpenAI trained o3 (a model optimized more for programming tasks) in early 2025, they finally had the technical foundation for product development.

But by then, Claude Code was already ready for public release.

03 Acquisition Blocked and Internal Sprint: OpenAI’s Dual Strategy

In February 2025, Claude Code was first introduced as a “limited research preview,” and by May, it was fully open for use.

This product differed from the previous “ambient coding” mode. Ambient coding is a human-led, AI-assisted programming style, where humans make decisions and AI executes specific actions. Claude Code, however, could work directly in the command line, accessing all user files and applications, allowing developers to truly delegate some tasks to AI.

OpenAI also accelerated its pace.

In March, Sottiaux assembled a “sprint team,” integrating several internal groups to plan a quick launch of a competing product within weeks. Meanwhile, Otman began seeking acquisition targets, eyeing a startup called Windsurf, valued at $3 billion. If acquired, the product, team, and enterprise clients could be quickly integrated.

But the deal was put on hold by Microsoft for several months.

According to The Wall Street Journal, Microsoft wanted to acquire the IP rights of Windsurf. The cloud giant has been supporting GitHub Copilot with OpenAI’s models since 2021, regularly mentioning it in earnings calls. But after Cursor, Windsurf, and Claude Code emerged one after another, Copilot’s product form seemed somewhat outdated. At this point, OpenAI’s new coding product made Microsoft’s attitude more complicated.

The Windsurf deal coincided with OpenAI and Microsoft renegotiating their partnership agreement. OpenAI sought more autonomy from Microsoft, hoping not to have its products and computing resources overly controlled. The deal ultimately became a casualty of this power struggle. By July, the negotiations collapsed. Later, Google recruited Windsurf’s founders, and the remaining team was absorbed by another startup, Cognition.

“I was really hoping to close that deal,” Otman said. “But not every deal is within control.”

He also mentioned that the Codex team’s performance surprised him. During those negotiation months, Sottiaux and Embiricos kept iterating on their products without stopping. By August, OpenAI was pushing its own products faster.

04 From 5% to 40%: Codex’s Market Share Rapid Growth

Brockman developed a testing method called “Reverse Turing Test.” He personally wrote the program years ago. The rules are: two people sit in front of two computers, each with two chat windows—one connected to the other person, the other to AI. The goal is to identify which window is AI, while making the other person believe you are the AI.

Last year, for most of the time, OpenAI’s best model took hours to complete this game’s coding, requiring step-by-step guidance. By December, with GPT-5.2 powering Codex and a well-structured prompt, it could directly generate a playable game.

It wasn’t just Brockman noticing the change. Developer communities began discussing the rapid improvement of AI programming agents, spreading from Silicon Valley to wider circles. Even those without programming backgrounds started experimenting with these tools for simple projects.

Both Anthropic and OpenAI competed fiercely for users. Some developers reported paying $200 monthly for Codex or Claude Code subscriptions, but actually accessing services worth over $1,000. Both companies used generous usage limits to guide users into workflows, then charged based on actual usage once habits formed.

Data shows OpenAI is indeed narrowing the gap.

By September 2025, Codex’s usage was about 5% of Claude Code’s. By January 2026, the ratio had risen close to 40%.

Simon Last, co-founder of Notion, said he and his team switched from Claude Code to Codex after GPT-5.2 was released, mainly because Codex was more stable. “I found that Claude Code sometimes gave inaccurate information,” he said. “It claims to be working on a task but isn’t making progress.”

Katy Shi, responsible for Codex behavior research at OpenAI, said some users found Codex’s responses somewhat “dry,” but more and more people are accepting this straightforward style. “In engineering work, you need to accept critical feedback; you can’t get offended just because the expression is direct.”

Enterprise clients are gradually coming in. Fidji Simo, CEO of OpenAI’s enterprise division, said: “ChatGPT has become a flagship product in AI, which is a clear advantage in the B2B market. Most companies prefer to use familiar technology.” OpenAI’s strategy for selling Codex is mainly to bundle it into the ChatGPT enterprise suite.

Jeetu Patel, president of Cisco, told employees not to worry too much about the costs generated by using Codex, emphasizing that familiarity with the tool is more important. When asked if using it might lead to job loss, he replied: “No, but not using it might make you less competitive over time.”

Some developers believe OpenAI’s channel advantages in the enterprise market are playing a role. Many companies have purchased the enterprise version of ChatGPT, adding Codex functionality at little extra cost.

Some analysts also pointed out that recent improvements in Codex’s capabilities are directly related to GPT-5.2’s reasoning enhancements. The training method used for the o-series models involves iterative trial-and-error with feedback in verifiable programming tasks, which significantly improves code quality. Programming itself is a domain with clear feedback signals—code either runs or it doesn’t—making it highly suitable for model iteration.

05 Otman’s Dilemma: Speed vs. Loss of Control

The impact of AI programming agents is no longer limited to developer communities.

The Wall Street Journal last month attributed part of the $1 trillion sell-off in tech stocks to Claude Code, citing investor concerns that the value of software itself might be compressed. Subsequently, Anthropic announced that Claude Code could modernize legacy systems running COBOL for IBM, causing IBM’s stock to fall by its largest single-day drop in 25 years.

OpenAI is also increasing its investment. This year’s Super Bowl ad featured Codex instead of ChatGPT.

At OpenAI’s headquarters, Codex is now widely used. Several engineers mentioned that they rarely write code manually anymore; their daily work mainly involves interacting with Codex.

An engineer who participated in an internal hackathon described that about 100 people used four hours to build a working demo project with Codex. Many of these projects were developed using Codex, with the goal of helping engineers better utilize it. Some teams created tools to automatically summarize Slack messages into weekly reports, others generated internal knowledge base guides. Tasks that previously took days can now be completed in a single afternoon.

Kevin Weil, former Instagram executive now leading OpenAI for Science, said Codex now helps him handle some projects overnight, so he only needs to check progress in the morning. This has become a routine for him and hundreds of colleagues. One of OpenAI’s goals for 2026 is to develop an AI intern capable of autonomous research.

Simo said that ultimately, Codex will be integrated into ChatGPT and all product lines, not just for programming but to assist with various tasks.

Otman expressed a desire to release a general version of Codex but still has safety concerns. At the end of January, he was asked by a non-technical friend to help install OpenClaw, but he declined, thinking “it’s not the right time yet,” as the agent might accidentally delete important files. However, in the past few weeks, OpenAI recruited the creator of OpenClaw into the company.

Many developers believe the gap between Codex and Claude Code is narrowing, but some organizations remain cautious about OpenAI’s progress. A nonprofit called Midas Project released a report criticizing that OpenAI has not fully disclosed cybersecurity risks related to GPT-5.3-Codex, and the safety commitments are not transparent enough. Amelia Glaese, OpenAI’s head of alignment, denied that safety was sacrificed for advancing Codex, stating that Midas misunderstood the company’s commitments.

Brockman remains optimistic about progress toward AGI, believing “the project is on track.” But in the minds of many Silicon Valley engineers, he’s always been the kind of leader who checks every detail in the codebase just before product launch.

The current situation is different. Brockman now faces hundreds of thousands of AI agents executing specific tasks and projects. He says this new way of working “feels a bit more relaxed because, before, it was necessary to remember many details.” But sometimes, “you’re not quite sure how those issues were actually resolved.”

He notes that this change can make you “feel less sharp about perceiving problems than before.”

Special contributor Jin Lu also contributed to this article.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments