Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google Announces KV Cache Compression Technology, Storage Demand Likely to Be Affected
Large language models have always faced scalability issues. As the context window grows, the memory required for storing key-value (KV) caches increases proportionally, consuming GPU memory and reducing inference speed. To address this phenomenon, Google has introduced three compression algorithms: TurboQuant, PolarQuant, and Quantized Johnson-Lindenstrauss (QJL). These algorithms aim to efficiently compress caches without compromising model output quality.
US Stock Storage Sector Declines Across the Board
Google’s new compression technology has sparked market concerns about storage demand prospects. Following the news, memory manufacturer SanDisk’s stock price dropped as much as 9.2% on Wednesday, while Micron’s stock fell up to 6.3%.
The new memory compression technology, TurboQuant, can compress large model key-value caches down to 3 bits, achieving a sixfold reduction in memory and up to eightfold acceleration.
It is reported that TurboQuant can significantly reduce the cache memory footprint of large models without loss of accuracy. On NVIDIA’s H100 GPU, 4-bit TurboQuant is eight times faster than 32-bit unquantized keys when computing attention logic values. PolarQuant performs nearly lossless retrieval in “needle in a haystack” search tasks.
Morgan Stanley analysts pointed out that Google’s new compression technology only applies during inference and does not reduce hardware requirements. Instead, it may lower deployment costs and enable more AI applications.