Google has launched TurboQuant, reducing model memory usage by 6 times and increasing inference speed by 8 times, sparking discussions about declining memory stocks and shifts in demand structure.
Google introduced the TurboQuant algorithm, which compresses the memory footprint of large language models by at least 6 times, while boosting inference speed by up to 8 times without sacrificing model accuracy. The market quickly interpreted this technology as a “demand-side disruption,” with the underlying logic being quite straightforward: if AI models’ memory requirements during inference are compressed multiple times, it suggests that future growth curves for DRAM, HBM, and even NAND storage in data centers may face structural downward revisions.
Following the announcement, stocks related to memory and storage declined simultaneously, including SanDisk (SNDK) down 3.5%, Micron Technology (MU) down 3.4%, and Western Digital (WDC) down 1.63%; in the Asian supply chain, Samsung Electronics fell 4.71%, and SK Hynix declined by 6.23%. Some analysts believe that TurboQuant is more likely to change “resource utilization efficiency” rather than simply weaken demand.
According to explanations from Google’s research team, TurboQuant is a quantization algorithm designed for large language models and vector search systems, focusing on significantly compressing the “key-value cache” and high-dimensional vector data structures that consume the most resources in AI models. Tests show that this technology can reduce memory usage by at least 6 times while increasing inference speed by up to 8 times without compromising model accuracy.
This breakthrough directly addresses the key bottleneck in current AI infrastructure. The expansion of generative AI at the computational layer heavily depends on high-bandwidth memory such as HBM to support model weights and large-scale KV caches, preventing memory bottlenecks during inference. However, TurboQuant achieves compression by combining methods like PolarQuant and Quantized Johnson-Lindenstrauss (QJL), accomplishing this with almost “zero additional memory overhead,” enabling the same or even more efficient computations with fewer hardware resources.
The market quickly interpreted this technology as a “demand-side disruption.” Following the announcement, stocks related to memory and storage declined in unison, including SanDisk (SNDK) down 3.5%, Micron Technology (MU) down 3.4%, and Western Digital (WDC) down 1.63%; in the Asian supply chain, Samsung Electronics fell 4.71%, and SK Hynix declined by 6.23%.
The underlying logic is quite straightforward: if AI models’ memory demands during inference are compressed multiple times, it suggests that future demand growth for DRAM, HBM, and NAND storage in data centers may face structural downward revisions. Especially as the AI industry gradually shifts from “training-oriented” to “inference-oriented,” the marginal impact of efficiency optimization technologies will be amplified.
However, some analysts believe that TurboQuant is more likely to change “resource utilization efficiency” rather than simply weaken demand. As costs decrease and latency reduces, AI application scenarios may further expand, driving overall computational demand to continue growing, forming a structure of “unit demand decrease, total demand increase.” Many large memory manufacturers have already sold out their capacity this year; perhaps the market should consider: what is the true ceiling for AI growth?