Nvidia to launch its H800 series computing cards that feature limited interconnect rates for export to China

Based on the Ampere architecture, the A100 series computing cards have been widely adopted by numerous high-performance computing clusters (HPC) over the past three years. Nvidia launched a new generation of H100 series computing cards last year, based on the Hopper architecture, further improving computational power. These GPUs are heavily utilized for artificial intelligence and deep learning tasks.

Due to well-known reasons, Nvidia launched the A800 series computing cards specifically for the Chinese market in order to bypass the relevant export restrictions imposed last year. The specifications of the A800 series are largely identical to those of the original A100 series computing cards, with the main difference being the NVLink interconnect bus connection speed, which is limited to 400 GB/s, compared to 600 GB/s for the former.

According to Reuters, Nvidia has used the same method this year to reduce the interconnect speed of the standard H100 PCIe model by approximately half and introduce the H800 series computing cards for the Chinese market. Compared to the normal H100 product, the delay in certain large-scale model training caused by the restrictions leads to increased latency and reduced workload.

Some media have contacted Nvidia to inquire about the differences between the H100 and H800, but Nvidia has not provided a positive response explaining the differences, only stating that the H800 series computing cards fully comply with export control regulations.

The complete GH100 chip configuration consists of 8 groups of GPC, 72 groups of TPC, 144 groups of SM, and a total of 18,432 FP32 CUDA cores. It uses the fourth-generation Tensor Core, with a total of 576 cores and 60MB of L2 cache. However, not all of them are open in actual products. The SXM5 version enables 132 groups of SM, a total of 16,896 FP32 CUDA cores, 528 Tensor Cores, and 50MB of L2 cache. The PCIe 5.0 version enables 114 groups of SM, with only 14,592 FP32 CUDA cores. Moreover, the former has a TDP of 700W, while the latter is 350W.

In addition, the H100 supports Nvidia’s fourth-generation NVLink interface, which can provide up to 900 GB/s of bandwidth. At the same time, the H100 is the first GPU to support the PCIe 5.0 standard, and the first GPU to adopt HBM3, supporting up to six HBM3s with a bandwidth of 3TB/s, which is 1.5 times that of the A100 with HBM2E. The default VRAM capacity is 80GB.