Original source: Xinzhiyuan
Image source: Generated by Unbounded AI
“Who will get how much H100 and when will be the hottest topic in Silicon Valley.”
OpenAI co-founder and part-time scientist Andrej Karpathy recently published an article explaining his views on the shortage of NVIDIA GPUs.
Recently, a picture “How many GPUs do we need” that has been widely circulated in the community has sparked discussions among many netizens.
According to the content shown in the figure:
GPT-4 was probably trained on about 10,000-25,000 A100s
Meta about 21000 A100
Tesla about 7000 A100
Stability AI about 5000 A100
Falcon-40B trained on 384 A100s
– Inflection used 3500 and H100 to train a model comparable to GPT-3.5
In addition, according to Musk, GPT-5 may need 30,000-50,000 H100.
Previously, Morgan Stanley had stated that GPT-5 uses 25,000 GPUs and has been training since February, but Sam Altman later clarified that GPT-5 has not yet been trained.
However, Altman previously stated,
We have a very short supply of GPUs, the fewer people using our products the better.
We’d be happy if people used less, because we don’t have enough GPUs.
In this article titled “Nvidia H100 GPU: Supply and Demand”, an in-depth analysis of the current technology companies’ usage and demand for GPUs.
The article speculates that the large-scale H100 cluster capacity of small and large cloud providers is about to run out, and the demand trend for H100 will continue until at least the end of 2024.
So, is GPU demand really a bottleneck?
At present, the explosion of generative AI has not slowed down, and it has put forward higher requirements for computing power.
Some startups are using Nvidia’s expensive and extremely high-performance H100 to train models.
GPUs are harder to come by than drugs at this point, Musk said.
Sam Altman says that OpenAI is GPU limited, which delays their short-term plans (fine-tuning, dedicated capacity, 32k context windows, multimodality).
Karpathy’s comments come as annual reports from major tech companies even discuss issues related to GPU access.
Last week, Microsoft released its annual report and highlighted to investors that GPUs are a “key raw material” for its rapidly growing cloud business. If the required infrastructure is not available, there may be a risk factor for data center outages.
This article is purportedly written by the author of the HK post.
He guessed that OpenAI may need 50,000 H100, while Inflection needs 22,000, Meta may need 25k, and large cloud service providers may need 30k (such as Azure, Google Cloud, AWS, Oracle).
Lambda and CoreWeave and other private clouds might need a total of 100k. He wrote that Anthropic, Helsing, Mistral, and Character might each need 10k.
The authors say these are all rough estimates and guesses, some of which are double-counting cloud and end customers renting equipment from the cloud.
Overall, global companies need about 432,000 H100s. Calculated at about $35k per H100, the total GPU needs cost $15 billion.
This does not include domestic Internet companies that need a large number of H800s.
There are also some well-known financial companies, such as Jane Street, JP Morgan, Two Sigma, etc., each of which is deploying, starting with hundreds of A/H100s and expanding to thousands of A/H100s.
All large labs including OpenAI, Anthropic, DeepMind, Google, and X.ai are training large language models, and Nvidia’s H100 is irreplaceable.
The H100 is more popular than the A100 as the first choice, partly due to lower cache latency and FP8 computing.
Because its efficiency is up to 3 times, but the cost is only (1.5-2 times). Considering the overall system cost, the performance of the H100 is much higher.
In terms of technical details, compared to the A100, the H100 is about 3.5 times faster at 16-bit reasoning, and about 2.3 times faster at 16-bit training.
A100 vs H100 speed
H100 training MoE
H100 massive acceleration
Most companies buy the H100 and use it for training and inference, while the A100 is mostly for inference.
But some companies are hesitant to switch because of the cost, capacity, risk of using and setting up new hardware, and the fact that existing software is already optimized for the A100.
An Nvidia executive said the problem isn’t a shortage of GPUs, but how those GPUs get to market.
Nvidia is producing GPUs at full capacity, but the executive said that GPU production capacity is mainly limited by the supply chain.
The chip itself may have sufficient capacity, but insufficient capacity of other components will severely limit the capacity of the GPU.
The production of these components relies on other suppliers throughout the world.
But the demand is predictable, so now the problem is gradually being solved.
GPU chip production capacity
First of all, Nvidia only cooperates with TSMC to produce the H100. All of Nvidia’s 5nm GPUs are only partnered with TSMC.
It is possible to cooperate with Intel and Samsung in the future, but it is impossible in the short term, which limits the production of H100.
According to the whistleblower, TSMC has 4 production nodes to provide capacity for 5nm chips: N5, N5P, N4, N5P
The H100 is only produced on the 4N node of N5 or N5P, which is a 5nm enhanced node.
Nvidia needs to share the capacity of this node with Apple, Qualcomm and AMD.
The TSMC fab needs to plan the production capacity of each customer 12 months in advance.
If Nvidia and TSMC underestimated the demand for H100 before, then the production capacity will be limited now.
According to the whistleblower, it will take about half a year for the H100 to go from production to delivery.
And the whistleblower also quoted a retired semiconductor industry professional as saying that the fab is not the production bottleneck of TSMC, and CoWoS (3D stacking) packaging is the gate of TSMC’s production capacity.
H100 memory capacity
As for another important component on the H100, the H100 memory, there may also be a problem of insufficient capacity.
HBM (High Bandwidth Memory), which is integrated with GPU in a special way, is a key component to ensure GPU performance.
The whistleblower quoted an industry insider as saying:
The main problem is HBM. Making it is a nightmare. Since HBM is difficult to produce, supplies are very limited. Both production and design must follow its rhythm.
For HBM3 memory, Nvidia almost always uses SK Hynix products, and there may be some Samsung products, and there should be no Micron products.
Nvidia wants SK Hynix to increase production capacity, and they are doing it. But both Samsung and Micron have limited capacity.
Moreover, many other materials and processes, including rare earth elements, will be used in the manufacture of GPUs, which will also become possible factors that limit GPU production capacity.
Nvidia’s statement
Nvidia only revealed that they will be able to supply more GPUs in the second half of the year, but did not provide any quantitative information.
We are processing supply for the quarter today, but we are also procuring a significant amount of supply for the second half of the year. We believe that the supply in the second half of the year will be much higher than that in the first half.
– Nvidia CFO Colette Kress on February-April 2023 earnings call
What’s next?
The GPU supply issue is now a vicious cycle where scarcity causes GPU ownership to be seen as a moat, which leads to more GPUs being hoarded, exacerbating scarcity.
– A person in charge of a private cloud disclosed
**When will the next generation of H100 appear? **
According to Nvidia’s previous roadmap, the next generation of the H100 will not be announced until late 2024 to early 2025.
Until that point in time, the H100 will be Nvidia’s flagship product.
However, Nvidia will launch a 120GB water-cooled version of the H100 during this period.
According to industry insiders interviewed by the whistleblower, the H100 will be sold out by the end of 2023! !
As the Nvidia executives mentioned earlier, the computing power provided by the H100 GPU will eventually be integrated into the industry chain through various cloud computing providers, so the shortage of H100 is caused by GPU generation on the one hand.
Another aspect is how computing power cloud providers can effectively obtain H100 from Nvidia, and finally reach the customers who need it by providing cloud computing power.
The process is simply:
The computing power cloud provider purchases H100 chips from OEMs, and then builds computing power cloud services and sells them to various AI companies, so that end users can obtain H100 computing power.
There are also various factors in this process, which have caused the current shortage of H100 computing power, and the article that broke the news also provides a lot of information within the industry for your reference.
**Who can I buy the H100 board from? **
OEMs such as Dell, Lenovo, HPE, Supermicro and Quanta will sell both the H100 and the HGX H100.
Cloud providers like CoreWeave and Lambda buy GPUs from OEMs and lease them to startups.
Hyperscalers (Azure, GCP, AWS, Oracle) will work more directly with Nvidia, but will also buy from OEMs. This seems to be similar to the way gamers buy graphics cards. But even to buy DGX, users need to purchase through OEM, and cannot place an order directly with Nvidia.
delivery time
The lead time for the 8-GPU HGX server is terrible, the lead time for the 4-GPU HGX server is just fine.
But every customer wants an 8-GPU server!
Does the startup buy from OEMs and resellers?
If a start-up company wants to obtain the computing power of H100, it does not end up buying H100 and plugging it into its own GPU cluster.
They usually rent computing power from large clouds such as Oracle, private clouds such as Lambda and CoreWeave, or providers that work with OEMs and data centers such as FluidStack.
If you want to build your own data center, you need to consider the time to build the data center, whether you have the personnel and experience in hardware, and whether the capital expenditure can be afforded.
Renting and hosting servers has just gotten easier. If users want to build their own data centers, a dark fiber line must be laid to connect to the Internet - $10,000 per kilometer. Much of the infrastructure has already been built and paid for during the dot-com boom. Just rent it, it’s cheap.
– Person in charge of a private cloud
The sequence from leasing to self-built cloud services is roughly: on-demand renting cloud services (pure lease cloud services), scheduled cloud services, managed cloud services (purchasing servers, cooperating with providers to host and manage servers), self-hosting (purchasing by yourself) and hosting server)).
Most start-ups that need H100 computing power will choose to book cloud services or managed cloud services.
Comparison between large cloud computing platforms
For many startups, the cloud services provided by large cloud computing companies are the ultimate source of their H100.
The choice of cloud platform also ultimately determines whether they can obtain stable H100 computing power.
The overall point is: Oracle is not as reliable as the big three clouds. But Oracle will provide more technical support help.
The main differences among the other large cloud computing companies are:
Networking: While most startups looking for large A100/H100 clusters are looking for InfiniBand, AWS and Google Cloud have been slower to adopt InfiniBand as they have their own methods for provisioning services.
Availability: Most of Microsoft Azure’s H100 is dedicated to OpenAI. Google has had a harder time acquiring the H100.
Because Nvidia seems to be inclined to provide more H100 quotas for those clouds that have no plans to develop competing machine learning chips. (This is all speculation, not hard truth.)
The three major cloud companies except Microsoft are all developing machine learning chips, and Nvidia alternative products from AWS and Google are already on the market, occupying a part of the market share.
In terms of the relationship with Nvidia, it might go like this: Oracle and Azure > GCP and AWS. But that’s just guesswork.
Smaller cloud computing power providers will be cheaper, but in some cases, some cloud computing providers will exchange computing power for equity.
Nvidia will provide each customer with a quota of H100.
But if Azure says “Hey, we want to get 10,000 H100, all for Inflection” you get a different quota than if Azure says “Hey, we want to get 10,000 H100 for the Azure cloud”.
Nvidia cares about who the end customer is, so if Nvidia is interested in the end use customer, the cloud computing provider platform will get more H100.
Nvidia wants to understand as much as possible who the end customer is, and they prefer customers with good brands or startups with a strong pedigree.
Yes, that seems to be the case. NVIDIA likes to guarantee GPU access to emerging AI companies (many of which have close ties to them). See Inflection - an AI company they invest in - testing a huge H100 cluster on CoreWeave, which they also invest in.
– Person in charge of a private cloud
The current thirst for GPUs is both froth and hype, but it does exist objectively.
There are companies like OpenAI with products like ChatGPT that are getting traction, but they still can’t get enough GPUs.
Other companies are buying and hoarding GPUs for future use, or for training large language models that the market may not even use. This creates a bubble of GPU shortages.
But no matter how you look at it, Nvidia is the green king in the fortress.
References: