Nvidia is developing new AI chips specially for China

Over the past three years, the A100 series compute cards, based on the Ampere architecture, have been widely adopted by numerous high-performance computing clusters (HPC). Last year, NVIDIA launched the new generation H100 series compute cards, which are grounded in the Hopper architecture, to further enhance computational power. These GPUs are extensively utilized for tasks in artificial intelligence and deep learning. Owing to well-known reasons, NVIDIA circumvented the export restrictions imposed last year by successively introducing the A800 and H800 series compute cards, exclusively for the Chinese market. Compared to the original A100/H100 series, the A800/H800 series maintains essentially identical specifications, with the most significant difference being the connection rate of the NVLink interconnect bus.

Nvidia trillion-dollar

With the recent advent of more stringent regulatory measures, the A800/H800 series compute cards have not evaded restrictions, and even consumer-end products like the GeForce RTX 4090 gaming graphics card have been impacted. According to a report by Chinastarmarket, NVIDIA is reverting to its old strategy and is currently developing new, improved chips tailored for the Chinese market, including the HGX H20, L20 PCIe, and L2 PCIe models.

It is understood that the HGX H20, L20 PCIe, and L2 PCIe are all re-engineered based on the H100 compute card. NVIDIA is expected to announce related information after the 16th of this month, and domestic manufacturers could receive corresponding products within the next few days. “Science and Innovation Board Daily” has reached out to NVIDIA to confirm the veracity of this news, but as of press time, NVIDIA has not yet responded.

The H100 is equipped with the GH100 chip, which comprises a complete chip configuration of 8 GPCs, 72 TPCs, 144 SMs, and a total of 18,432 FP32 CUDA cores. It employs the fourth generation of Tensor Cores, totaling 576, and comes with 60MB of L2 cache. However, not all features are activated in the actual products; for instance, the SXM5 version activates 132 SMs, amounting to 16,896 FP32 CUDA cores, 528 Tensor Cores, and 50MB of L2 cache, whereas the PCIe 5.0 version only activates 114 SMs, with the number of FP32 CUDA cores being just 14,592. Moreover, the former has a TDP of 700W, while the latter stands at 350W.

Additionally, the H100 supports NVIDIA’s fourth-generation NVLink interface, offering bandwidth up to 900 GB/s. It is also the first GPU to support the PCIe 5.0 standard and the first to utilize HBM3, supporting up to six HBM3s with a bandwidth of 3TB/s, which is 1.5 times that of the A100’s HBM2E. The default memory capacity is 80GB.