On Intel Architecture Day 2021, Intel disclosed information including Alder Lake, Sapphire Rapids, Ponte Vecchio, and Alchemist. According to
Wccftech reports, on the subsequent Hot Chips 33, Intel shared more information about the Sapphire Rapids-SP processor and also introduced some chip packaging issues related to Ponte Vecchio
Intel said that Sapphire Rapids uses a new core and accelerator engine to set the standard for next-generation data center processors. Its core is a modularized partitioned SoC architecture, benefiting from EMIB interconnect packaging technology and advanced grid architecture, it has good scalability while still maintaining the advantages of a single-chip CPU interface.
Sapphire Rapids will be based on Intel 7 process technology, supporting PCIe Gen5, CXL 1.1 (Compute Express Link), eight-channel DDR5 memory, and HBM technology.
Sapphire Rapids supports the Intel Accelerator Interface Architecture instruction set (AIA) and Intel Advanced Matrix Extensions (AMX). AIA supports effective scheduling, synchronization, and signal transmission of accelerators and devices. AMX can provide a significant acceleration for the Tensor processing at the core of the deep learning algorithm, performing 2000 INT8 operations and 1000 BFP16 operations in each cycle. In addition, it also supports Intel Data Stream Accelerator (DSA), designed to offload the most common data movement tasks to provide higher overall workload performance.
Like the information circulated before, Sapphire Rapids-SP will have two different package models, one is standard configuration, the other is HBM configuration. The standard configuration will be a small chip design composed of four XCC chips. The area of a single XCC chip is about 400 square millimeters. It is interconnected by EMIB. The EMIB pitch is 55u and the core pitch is 100u. The standard Sapphire Rapids-SP will have 10 EMIB interconnects, and the size of the entire package is 4446 square millimeters. If it is the HBM configuration model, there will be 14 EMIB interconnects, because the HBM2E memory needs to be interconnected to the core, the size of the entire package is 5700 square millimeters. AMD’s EPYC processor codenamed Genoa has a package area of 5428 square millimeters, which is higher than the Sapphire Rapids-SP standard configuration version and slightly smaller than the HBM configuration version.
Intel said that compared with standard package design, the EMIB link provides twice the bandwidth and four times the power efficiency. Also, beneficiaries are Ponte Vecchio based on the Xe HPC architecture, which is a master of Intel’s current advanced technology, with more than 100 billion transistors, specifically designed for HPC and AI workloads. There are a total of 47 different units (Tile) inside, including computing unit, Rambo cache unit, Foveros packaging unit, base unit, HBM unit, Xe link unit, and EMIB unit, etc., using 5 different manufacturing processes
Like Alchemist graphics cards, Ponte Vecchio’s Xe HPC architecture is also based on the new Xe Core (Xe Core), however, the structure is different, with 8 512-bit vector engines and 8 4096-bit matrix engines (Xe Matrix eXtension, XMX). Although the number of bits of the Xe HPG architecture is halved compared to the Xe HPG architecture, the number of bits is twice and four times that of the Xe HPG architecture. At the same time, each Xe core has 512KB of L1 cache. The Xe HPC architecture also supports other data types, such as TF32 (Tensor Float 32). 16 Xe cores can form a slice (Slice), four slices form a stack, there will be a total of 64 Xe cores. There are also L2 cache, 4 HBM2E memory controllers, 1 media engine, 8 Xe link PCle controllers, etc. in the stack. Through multi-stack design, coupled with EMIB interconnect packaging technology and interconnection channels between each other, it can be further expanded. For example, Ponte Vecchio uses a dual-stack design.
This time Intel also introduced the package of Ponte Vecchio and the corresponding chip size. For example, the package size of Ponte Vecchio is 4843.7 square millimeters, and the 3D Forveros package pitch used is 36u, etc. Ponte Vecchio’s computing unit uses TSMC’s 5nm process, each unit has 8 Xe cores and 4MB of L1 cache. There are basic units responsible for I/O and high-bandwidth components using Intel 7 technology, an area of 640 square millimeters, and a 144MB L2 cache.
The Xe link unit uses TSMC’s 7nm process and is responsible for the interconnection between GPUs. Ponte Vecchio is in the A0 version stage with at least 45 TFLOPs of FP32 throughput.
Intel also has several next-generation advanced packaging design solutions, such as Forveros Omni and Forveros Direct.