Huang Renxun 1.5 hours straight announces 8 new products, NVIDIA is fully betting on AI inference and physical AI

Author | ZeR0 Junda, TechWeb

Editor | MoYing

TechWeb Las Vegas January 5 Report, just now, NVIDIA founder and CEO Huang Renxun delivered the first keynote speech of 2026 at CES 2026. Huang Renxun, as always dressed in leather, announced 8 major releases within 1.5 hours, covering chips, racks, and network design, providing an in-depth introduction to the entire new generation platform.

In the fields of accelerated computing and AI infrastructure, NVIDIA announced the NVIDIA Vera Rubin POD AI supercomputer, NVIDIA Spectrum-X Ethernet co-packaged optical devices, NVIDIA inference context memory storage platform, and the NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

The NVIDIA Vera Rubin POD uses six self-developed NVIDIA chips, covering CPU, GPU, Scale-up, Scale-out, storage, and processing capabilities. All components are co-designed to meet the needs of advanced models and reduce computing costs.

Among them, Vera CPU adopts a custom Olympus core architecture, Rubin GPU introduces a Transformer engine with NBFP4 inference performance up to 50 PFLOPS, each GPU NVLink bandwidth reaches 3.6 TB/s, supporting third-generation confidential computing (the first rack-level TEE), achieving a complete trusted execution environment across CPU and GPU domains.

All these chips have been taped out, and NVIDIA has validated the entire NVIDIA Vera Rubin NVL72 system. Partners have also begun running their internal AI models and algorithms, and the entire ecosystem is preparing for Vera Rubin deployment.

In other releases, NVIDIA Spectrum-X Ethernet co-packaged optical devices significantly optimize power efficiency and application uptime; NVIDIA inference context memory storage platform redefines the storage stack to reduce redundant computation and improve inference efficiency; the NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72 reduces token costs of large MoE models to 1/10.

Regarding open models, NVIDIA announced an expansion of its open-source model ecosystem, releasing new models, datasets, and libraries, including the NVIDIA Nemotron open-source model series with new Agentic RAG models, safety models, speech models, and a new open model for all types of robots. However, Huang Renxun did not go into detailed explanations during the speech.

In physical AI, the era of physical AI ChatGPT has arrived, NVIDIA’s full-stack technology enables the global ecosystem to transform industries through AI-driven robotics; NVIDIA’s extensive AI toolkit, including the new Alpamayo open-source model suite, allows the global transportation industry to quickly achieve safe L4 driving; NVIDIA DRIVE autonomous driving platform is now in production, installed in all new Mercedes-Benz CLA models, for L2++ AI-defined driving.

01. New AI Supercomputer: 6 self-developed chips, single-rack computing power reaches 3.6 EFLOPS

Huang Renxun believes that every 10 to 15 years, the computer industry undergoes a comprehensive reshaping, but this time, two platform revolutions are happening simultaneously—from CPU to GPU, from “programming software” to “training software,” accelerated computing and AI are reconstructing the entire computing stack. The computing industry, worth $10 trillion over the past decade, is undergoing a modernization.

Meanwhile, the demand for computing power is soaring. Model sizes grow 10 times annually, tokens used for thinking increase 5 times per year, and the price per token decreases 10 times annually.

To meet this demand, NVIDIA has decided to release new computing hardware every year. Huang Renxun revealed that Vera Rubin has now fully entered production.

NVIDIA’s new AI supercomputer, NVIDIA Vera Rubin POD, uses six self-developed chips: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 (CX9) intelligent NIC, BlueField-4 DPU, Spectrum-X 102.4T CPO.

Vera CPU: Designed for data movement and agent processing, featuring 88 NVIDIA custom Olympus cores, 176-thread NVIDIA spatial multithreading, 1.8TB/s NVLink-C2C supporting CPU:GPU unified memory, system memory up to 1.5TB (3 times that of Grace CPU), SOCAMM LPDDR5X memory bandwidth of 1.2TB/s, supporting rack-level confidential computing, doubling data processing performance.

Rubin GPU: Introduces a Transformer engine, with NVFP4 inference performance up to 50 PFLOPS, 5 times that of Blackwell GPU, backward compatible, maintaining inference accuracy while boosting BF16/FP4 performance; NVFP4 training performance reaches 35 PFLOPS, 3.5 times Blackwell.

Rubin is also the first platform supporting HBM4, with bandwidth of 22TB/s, 2.8 times that of the previous generation, capable of providing the performance needed for demanding MoE models and AI workloads.

NVLink 6 Switch: Single-lane rate increased to 400Gbps, using SerDes technology for high-speed signal transmission; each GPU achieves 3.6TB/s full interconnect bandwidth, twice that of the previous generation, with total bandwidth of 28.8TB/s, FP8 in-network compute performance up to 14.4 TFLOPS, supporting 100% liquid cooling.

NVIDIA ConnectX-9 SuperNIC: Each GPU provides 1.6Tb/s bandwidth, optimized for large-scale AI, fully software-defined, programmable, and accelerated data paths.

NVIDIA BlueField-4: 800Gbps DPU for smart NICs and storage processors, equipped with 64-core Grace CPU, combined with ConnectX-9 SuperNIC for offloading network and storage-related computations, enhancing network security, with computing performance 6 times that of the previous generation, memory bandwidth tripled, and GPU data access speed doubled.

NVIDIA Vera Rubin NVL72: Integrates all the above components into a single rack processing system, with 2 trillion transistors, NVFP4 inference performance of 3.6 EFLOPS, NVFP4 training performance of 2.5 EFLOPS.

The system’s LPDDR5X memory capacity reaches 54TB, 2.5 times that of the previous generation; total HBM4 memory is 20.7TB, 1.5 times the previous; HBM4 bandwidth is 1.6PB/s, 2.8 times the previous; total vertical expansion bandwidth reaches 260TB/s, surpassing the total bandwidth scale of global internet.

Based on the third-generation MGX rack design, the compute tray adopts a modular, hostless, cableless, fanless design, making assembly and maintenance 18 times faster than GB200. The original 2-hour assembly process now takes about 5 minutes, and while about 80% of the system used liquid cooling before, it is now 100% liquid-cooled. A single system weighs about 2 tons, and with water cooling liquid, up to 2.5 tons.

NVLink Switch tray modules enable zero-downtime maintenance and fault tolerance, allowing the rack to operate even when trays are removed or partially deployed. The second-generation RAS engine supports zero-downtime health checks.

These features improve system uptime and throughput, further reduce training and inference costs, and meet data center requirements for high reliability and maintainability.

More than 80 MGX partners are ready to support Rubin NVL72 deployment in ultra-large-scale networks.

02. Three major new products dramatically improve AI inference efficiency: new CPO devices, new context storage layers, new DGX SuperPOD

Meanwhile, NVIDIA announced three important new products: NVIDIA Spectrum-X Ethernet co-packaged optical devices, NVIDIA inference context memory storage platform, and the NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

1. NVIDIA Spectrum-X Ethernet co-packaged optical devices

NVIDIA Spectrum-X Ethernet co-packaged optical devices are based on the Spectrum-X architecture, using a 2-chip design, with 200Gbps SerDes, each ASIC chip providing 102.4Tb/s bandwidth.

The switching platform includes a 512-port high-density system and a 128-port compact system, with each port at 800Gb/s.

CPO (co-packaged optical) switching system offers 5x energy efficiency, 10x reliability, and 5x application uptime improvements.

This means handling more tokens daily, further lowering the total cost of ownership (TCO) for data centers.

2. NVIDIA inference context memory storage platform

NVIDIA inference context memory storage platform is a POD-level AI-native storage infrastructure for storing KV Cache, based on BlueField-4 and Spectrum-X Ethernet acceleration, tightly coupled with NVIDIA Dynamo and NVLink, enabling collaborative context scheduling across memory, storage, and network.

This platform treats context as a first-class data type, achieving 5x inference performance and 5x better efficiency.

This is crucial for improving multi-turn dialogue, RAG, Agentic multi-step reasoning, and other long-context applications, which rely heavily on efficient storage, reuse, and sharing of context across the system.

AI is evolving from chatbots to Agentic AI, capable of reasoning, calling tools, and maintaining long-term states, with context windows expanded to millions of tokens. These contexts are stored in KV Cache, and re-computing at each step wastes GPU time and causes huge delays, hence the need for storage.

But GPU VRAM, while fast, is scarce, and traditional network storage is inefficient for short-term context. The bottleneck in AI inference is shifting from computation to context storage. Therefore, a new memory layer optimized for inference, between GPU and storage, is needed.

This layer is no longer an afterthought but must be co-designed with network storage to move context data with minimal overhead.

As a new storage hierarchy, NVIDIA inference context memory storage platform does not directly exist within the host system but connects outside the compute device via BlueField-4. Its key advantage is more efficient scaling of storage pools, avoiding redundant KV Cache computation.

NVIDIA is working closely with storage partners to integrate NVIDIA inference context memory storage into the Rubin platform, enabling customers to deploy it as part of a complete integrated AI infrastructure.

3. NVIDIA DGX SuperPOD built on Vera Rubin

At the system level, NVIDIA DGX SuperPOD serves as a blueprint for large-scale AI factory deployment, using 8 DGX Vera Rubin NVL72 systems, with NVLink 6 vertical expansion network, Spectrum-X Ethernet horizontal expansion network, built-in NVIDIA inference context memory storage platform, and engineering-verified.

Managed by NVIDIA Mission Control software, it achieves extreme efficiency. Customers can deploy it as a turnkey platform, completing training and inference tasks with fewer GPUs.

Thanks to the optimized design across 6 chips, trays, racks, pods, data centers, and software layers, the Rubin platform significantly reduces training and inference costs. Compared to Blackwell, training the same scale MoE model requires only 1/4 of the GPUs; for large MoE models, token costs are reduced to 1/10 at the same latency.

NVIDIA DGX SuperPOD with DGX Rubin NVL8 systems is also released.

Leveraging the Vera Rubin architecture, NVIDIA is working with partners and customers to build the world’s largest, most advanced, and cost-effective AI systems, accelerating mainstream AI adoption.

The Rubin infrastructure will be available via CSP and system integrators in the second half of this year, with Microsoft and others among the first adopters.

03. Expanding the Open Model Universe: New models, data, and open-source ecosystem contributors

On the software and model front, NVIDIA continues to increase open-source investments.

Mainstream development platforms like OpenRouter show that in the past year, AI model usage has increased 20-fold, with about 1/4 of tokens coming from open-source models.

In 2025, NVIDIA was the largest contributor of open-source models, data, and recipes on Hugging Face, releasing 650 open-source models and 250 open datasets.

NVIDIA’s open models rank highly on multiple leaderboards. Developers can not only use these open models but also learn from them, continue training, expand datasets, and build AI systems using open-source tools and documented techniques.

Inspired by Perplexity, Huang Renxun observed that Agents should be multi-model, multi-cloud, and hybrid-cloud, which is also the fundamental architecture of Agentic AI systems, adopted by nearly all startups.

With NVIDIA’s open models and tools, developers can now customize AI systems and utilize cutting-edge models. Currently, NVIDIA has integrated these frameworks into a “blueprint” and embedded them into SaaS platforms, enabling rapid deployment.

In live demos, this system can automatically determine whether a task should be handled by local private models or cloud frontier models based on user intent, call external tools (such as email APIs, robot control interfaces, calendar services), and perform multimodal fusion to unify processing of text, speech, images, and robot sensor signals.

These complex capabilities were unimaginable in the past but are now trivial. Similar capabilities are available on enterprise platforms like ServiceNow and Snowflake.

04. Open-source Alpha-Mayo model, enabling autonomous vehicles to “think”

NVIDIA believes that physical AI and robotics will eventually become the largest segment of consumer electronics worldwide. All movable things will ultimately achieve full autonomy driven by physical AI.

AI has gone through perception AI, generative AI, and now Agentic AI stages, and is entering the era of physical AI, where intelligent systems understand physical laws and generate actions directly from perceptions of the physical world.

To achieve this goal, physical AI must learn common sense about the world—object permanence, gravity, friction. Acquiring these abilities will rely on three computers: training computers (DGX) to develop AI models, inference computers (robots/vehicle chips) for real-time execution, and simulation computers (Omniverse) for generating synthetic data and verifying physical logic.

The core model among these is the Cosmos world foundation model, aligning language, images, 3D, and physical laws, supporting the entire chain from simulation to training data generation.

Physical AI will appear in three types of entities: buildings (factories, warehouses), robots, and autonomous vehicles.

Huang Renxun believes that autonomous driving will be the first large-scale application scenario of physical AI. Such systems need to understand the real world, make decisions, and execute actions, with high requirements for safety, simulation, and data.

In response, NVIDIA has released Alpha-Mayo, a complete system composed of open-source models, simulation tools, and physical AI datasets, to accelerate safe, inference-based physical AI development.

Its product suite provides the foundational modules for automakers, suppliers, startups, and researchers worldwide to build L4 autonomous driving systems.

Alpha-Mayo is the industry’s first truly “thinking” model for autonomous vehicles, and it has been open-sourced. It decomposes problems into steps, reasons about all possibilities, and chooses the safest path.

This reasoning-based task-action model enables autonomous driving systems to handle previously unseen complex edge scenarios, such as traffic light failures at busy intersections.

Alpha-Mayo has 10 billion parameters, sufficient to handle autonomous driving tasks, yet lightweight enough to run on workstations designed for autonomous vehicle research.

It can accept text, surround-view cameras, vehicle history states, and navigation inputs, and output driving trajectories and reasoning processes, helping passengers understand why the vehicle took certain actions.

In the promotional video, driven by Alpha-Mayo, autonomous vehicles can perform pedestrian avoidance, pre-emptive left-turn vehicle prediction, and lane changes without human intervention.

Huang Renxun states that Mercedes-Benz CLA equipped with Alpha-Mayo has already entered mass production and was recently rated the safest car in the world by NCAP. Every line of code, chip, and system has passed safety certification. The system will be launched in the US market, with more advanced driving capabilities, including highway hands-free driving and end-to-end urban autonomous driving, to be introduced later this year.

NVIDIA also released some datasets for training Alpha-Mayo, an open-source inference model evaluation simulation framework Alpha-Sim. Developers can fine-tune Alpha-Mayo with their own data or generate synthetic data using Cosmos, then train and test autonomous driving applications with a combination of real and synthetic data. Additionally, NVIDIA announced that the NVIDIA DRIVE platform is now in production.

NVIDIA states that leading global robotics companies such as Boston Dynamics, Franka Robotics, Surgical robots, LG Electronics, NEURA, XRLabs, and Zhiyuan Robotics are all built on NVIDIA Isaac and GR00T.

Huang Renxun also announced a new collaboration with Siemens. Siemens is integrating NVIDIA CUDA-X, AI models, and Omniverse into its EDA, CAE, and digital twin tools and platforms. Physical AI will be widely used throughout the entire process from design and simulation to manufacturing and operations.

05. Conclusion: Embrace open source with the left hand, make hardware systems irreplaceable with the right

As AI infrastructure shifts focus from training to large-scale inference, platform competition has evolved from single-point compute power to system engineering covering chips, racks, networks, and software, aiming to deliver maximum inference throughput at the lowest TCO. AI is entering a new stage of “factory-like operation.”

NVIDIA emphasizes system-level design; Rubin achieves performance and cost improvements in both training and inference, and can serve as a plug-and-play replacement for Blackwell, enabling seamless transition.

In platform positioning, NVIDIA still considers training crucial because only by rapidly training the most advanced models can inference platforms truly benefit. Therefore, NVFP4 training has been integrated into Rubin GPUs to further boost performance and reduce TCO.

Meanwhile, this AI computing giant continues to significantly strengthen network communication capabilities through vertical and horizontal expansion architectures, viewing context as a key bottleneck, and pursuing co-designed storage, network, and compute solutions.

NVIDIA is simultaneously open-sourcing extensively and making hardware, interconnects, and system designs increasingly “irreplaceable.” This continuous demand expansion, token consumption incentives, inference scale-up, and high-cost-performance infrastructure strategy form a closed loop, building an even more formidable moat for NVIDIA.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)