Tesla D1 chip has 50 billion transistors

In the recent AI Day event held by Tesla, Elon Musk and several engineers explained the progress of Tesla’s pure vision solution FSD, neural network autopilot training, D1 chip, and Dojo supercomputer, and other related information. Among them, the AI ​​training chip D1 developed by Tesla has attracted the interest of many people. This chip will be used in the supercomputer Tesla is currently building, designed to provide higher performance with less consumption and less space.

According to ComputerBase, the D1 chip is a custom chip manufactured using a 7nm process, with 50 billion transistors, and its die area is 645 mm², which is smaller than NVIDIA’s A100 (826 mm²) and AMD Arcturus (750 mm²). It is equipped with 354 training nodes and supports various instructions for AI training, including FP32, BFP16, CFP8, INT32, INT16, and INT8.

Image: Tesla

The D1 chip can provide 22.6 TFLOPS of single-precision floating-point computing performance, the peak computing power of BF16/CFP8 reaches 362 TFLOPS, and the thermal design power (TDP) does not exceed 400W. Scalability is very important for AI training, so interconnection in all directions is done through a “delayed switching structure” with a bandwidth of 10 TB/s. There will be an I/O ring around the D1 chip, with 576 channels, and each channel provides 112 Gbit/s bandwidth.

At the same time, 25 D1 chips can form a training module with a bandwidth of 36 TB/s, and the peak computing power of BF16/CFP8 can reach 9 PFLOPS.

If 120 training modules (including 3000 D1 chips) are deployed in several cabinets, ExaPOD can be formed. This is the world’s premier AI training supercomputer, with more than 1 million training nodes, and the peak computing power of BF16/CFP8 reaches 1.1 ExaFLOPS. Compared with Tesla’s current supercomputer based on NVIDIA equipment, under the same cost conditions, the performance has increased by 4 times, the performance per watt has increased by 1.3 times, and the floor space is only one-fifth.