NVIDIA released H100 GPU based on Hopper GPU architecture

At GTC 2022, NVIDIA released the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU architecture. As NVIDIA said, this is a GPU specially designed for supercomputers, focusing on AI performance, and through architectural updates and process improvements, its performance and efficiency have been improved to new levels.

With 80 billion transistors, the NVIDIA H100 has six times the performance improvement and twice the MMA improvement compared to the previous generation A100. This GPU is a CoWoS 2.5D wafer-level package, a single-chip design, and is manufactured by TSMC’s 4nm process, but it is a customized version for NVIDIA, which is different from the general N4 process.
Nvidia has not announced the core count and clock of the H100. It is understood that the full GH100 chip is equipped with 8 groups of GPCs, 72 groups of TPCs, 144 groups of SMs, and a total of 18,432 FP32 CUDA cores. It uses the fourth-generation Tensor Core, a total of 576, and is equipped with a 60MB L2 cache. However, not all of them are opened in the actual product. Among them, 132 groups of SMs are enabled in the SXM5 version, with a total of 16,896 FP32 CUDA cores, 528 Tensor Cores, and 50MB of L2 cache, while the PCIe 5.0 version has 114 groups of SMs enabled, and the number of FP32 CUDA cores is only 14,592. In addition, the TDP of the SXM5 version reaches 700W, and the PCIe 5.0 is 350W.

Nvidia said that the H100’s FP64/FP32 computing performance is 60 TFlops, FP16 computing performance is 2000 TFlops, and TF32 computing performance is 1000 TFlops, all three times that of the A100. In addition, NVIDIA has improved the support for FP8 operations on the Hopper architecture, making its computing performance reach 4000 TFlops, six times that of the A100.

According to NVIDIA, the H100 supports NVIDIA’s fourth-generation NVLink interface, which can provide up to 900 GB/s of bandwidth. At the same time, the H100 also supports systems that do not use the NVLink interface, replacing it with PCIe 5.0, with a bandwidth of 128 GB/s. Nvidia said that the H100 is the first GPU to support the PCIe 5.0 standard and the first to use HBM3. It supports up to six HBM3s with a bandwidth of 3TB/s, which is 1.5 times that of the A100 using HBM2E. The default memory capacity is 80GB.

Nvidia has also added new DPX instructions designed to accelerate dynamic programming to help a wider range of algorithms, including route optimization and genomics. Nvidia says these algorithms perform seven times faster than its previous-generation GPUs and forty times faster than CPU-based algorithms. The Hopper architecture has also improved security, with second-generation Multi-Instance GPU (MIG) technology providing approximately 3x more compute capacity and nearly 2x more memory bandwidth per GPU instance compared to A100.
The NVIDIA H100 can be widely deployed in various types of data centers and will be available to cloud service providers and computer manufacturers worldwide, with availability expected to begin in the third quarter of this year.