AMD launches EPYC 9704 with Zen 4c architecture
At its recent data center and AI technology conference, AMD unveiled a raft of new products including the EPYC Bergamo, built on the Zen 4c architecture, and the EPYC Genoa-X, employing 3D V-Cache technology. The former is designed to cater to the cloud computing sector, requiring high throughput and multithreading, while the latter suits workloads demanding considerable cache for high-performance computing. Furthermore, another product bearing the codename Siena, also built on the Zen 4c architecture, targets the telecommunications infrastructure and edge computing markets, slated for release later this year.
The Zen 4c architecture used in the EPYC 9704 Bergamo series uses the same Instruction Set Architecture (ISA) as Zen 4, but it essentially serves as a power-efficient streamlined version of the Zen 4 core. It boasts equivalent Instructions Per Cycle (IPC) and a higher power-efficiency ratio, with a Zen 4c core size significantly smaller than the conventional Zen 4 core. This allows each Zen 4c CCD to feature 16 cores, compared to the 8 cores in Zen 4 CCD. With a maximum of 8 CCDs in a single processor, the EPYC 9704 can house up to 128 cores.
Zen 4c, like Zen 4, uses TSMC’s 5nm process. Each Zen 4c CCD houses two CCXs, each with 8 cores and 16MB of L3 cache. The L1 and L2 cache per core are identical to Zen 4, meaning each core features 32KB of L1 data and instruction cache and 1MB of L2 cache. The IFOP design for Zen 4 and Zen 4c is the same, including two GMI3 links. However, it seems not both are used, and signals from the two CCXs need to be multiplexed through a single link to communicate with the IOD, similar to the Zen 2 architecture.
The size of a single Zen 4 core is 3.84mm2, while the Zen 4c core is only 2.48mm2. The core area has decreased by 35.4%, clearly more compact. Both feature 1MB of L2 cache, which implies the L2 SRAM unit occupies the same area. AMD has reduced the area of the L2 cache region by making the L2 control logic circuitry more compact. Excluding the L2 and related circuitry, the core area is reduced by a whopping 44.1%. The front-end and execution area are nearly halved, but the Floating Point Unit (FPU) isn’t reduced as much, likely due to heat dissipation considerations, as the FPU is typically the hottest part of the core. The SRAM unit layout within the core is also more compact, reducing the core area by 32.6%.
Thus, although Zen 4c increases the core count of a single CCD from 8 to 16, the overall size is still only 72.7mm2, less than 10% larger than Zen 4’s 66.3mm2. Of course, due to layout reasons, the total number of CCDs in a single processor has been reduced from 12 to 8, and the total number of transistors has dropped from 90 billion to 82 billion. Yet, the EPYC Bergamo remains a considerably substantial processor. The AMD EPYC 9704 series includes three models: the 128-core 256-thread EPYC 9754, the EPYC 9754S, which has SMT disabled and only has 128 threads, and the 112-core 224-thread EPYC 9734.
The top-of-the-line EPYC 9754 has 32 more cores than the top EPYC Genoa, the EPYC 9654. The TDP remains at 360W and can boost up to 400W, although the frequencies will be somewhat lower. The base frequency drops from 2.4GHz to 2.25GHz, and the max frequency drops from 3.7GHz to 3.1GHz. The total L3 cache also decreases from 384MB to 256MB.
In terms of performance, AMD compared the EPYC 9754 with the competitor’s strongest Sapphire Rapids 4th generation Xeon Platinum 8490H processor. The latter only has 60 cores, so it is essentially a one-sided domination. In reality, it is aimed at the higher core count ARM server products from Apple, Amazon, and Google, as well as Intel’s upcoming all-E-Core architecture Xeon processor, Sierra Forest, set to launch next year.
Next up is the EPYC Genoa-X with 3D V-Cache technology. This was an expected product as AMD released the Milan-X with 3D V-Cache last year. The Genoa-X EPYC 9084X series is simply a product line extension of this technology.
Structurally, the CCD used in the EPYC 9084X is exactly the same as the one used in the desktop Ryzen 7000X3D. It’s a standard Zen 4 CCD with an additional layer of 64MB SRAM stacked on top, skyrocketing the L3 cache capacity of a single CCD from 32MB to 96MB. On the EPYC, all 12 CCDs employ 3D V-Cache, bringing the total L3 cache capacity up to 1.1GB.
There are three models in the EPYC 9084X series: the EPYC 9684X with 96 cores and 192 threads, clocked at 2.55-3.7GHz, boasting 1152MB of L3 cache and a TDP of 400W; the EPYC 9384X with 32 cores and 64 threads, clocked at 3.1-3.9GHz; and the EPYC 9184X with 16 cores and 32 threads, clocked at 3.5-4.2GHz. The latter two models both feature 768MB of L3 cache and a TDP of 320W.
AMD provided two performance comparisons, one between the top-of-the-line EPYC 9684X and Xeon Platinum 8490H, and another between the 32-core EPYC 9384X and the Xeon Platinum 8462Y+. The performance benefits of the larger cache are quite evident.
Both the AMD EPYC 9704 and EPYC 9084X series processors are now available. They both fit into the same SP5 socket as the existing EPYC 9004 series processors and are platform-compatible. AMD is currently shipping in bulk to customers.