Nvidia introduces Grace CPU Superchip design

Nvidia launched the Grace CPU Superchip at GTC 2022 last year. This is a further expansion on the basis of the original Grace CPU, packaging two Grace CPUs together. It uses Arm’s Neoverse N2 platform and supports features such as PCIe 5.0, DDR5, HBM3, CCIX 2.0, and CXL 2.0. Recently, Nvidia officially introduced the design, performance, and energy efficiency of the Grace CPU Superchip.

The Grace CPU Superchip is manufactured using TSMC’s 4N process and has a total of 144 Arm v9 architecture CPU cores.

Two sets of SIMD vector instruction sets are configured in four 128-bit ways, namely SVE2 and NEON, each core has 64KB L1 instruction cache, 64KB L1 data cache, 1MB L2 cache, and all cores share 234MB L3 cache. Using the Scalable Coherency Fabric (SCF) designed by NVIDIA, this mesh structure, and distributed cache architecture realizes the communication between the core, NVLink-C2C, memory, and I/O through the ultra-high bandwidth of 3.2TB/s.

Grace CPU Superchip supports LPDDR5x memory with ECC verification function, the bandwidth reaches 1TB/s, and the maximum capacity is 960GB; equipped with 8 sets of PCIe 5.0 x16 interfaces, with a total bandwidth of 1TB/s, and additional low-speed PCIe channels for management; connected via Nvidia’s latest NVLink-C2C, which provides a connection bandwidth of 900 GB/s to ensure low latency and consistency between chip-to-chip interconnects, and allows connected devices to work on the same memory pool; supports Arm’s AMBA CHI protocol and supports accelerators that are fully consistent and secure with other interconnected processors; TDP is 500W.

According to Nvidia, the FP64 peak computing performance of Grace CPU Superchip reached 7.1TFlops. Compared with the dual-socket system built by AMD based on the EPYC 7763 processor (64 cores) of the Zen 3 architecture, the Grace CPU Superchip has 1.5 to 2.5 times the performance and 2 to 3.5 times the energy efficiency.

NVIDIA said that the Grace CPU Superchip is designed for AI and high-performance computing applications and can run all NVIDIA software stacks and platforms, including NVIDIA RTX, HPC, NVIDIA AI, and NVIDIA Omniverse. With NVLink-C2C technology, it is possible to create integrated products built from different types of chiplets such as CPU, GPU, DPU, NIC, and SoC. Since it supports the latest UCIe specification, its custom chips can choose to use UCIe or NVLink-C2C for interconnection in the future.