Samsung’s next-generation DDR5 and HBM3 memory will integrate AI engine

At the beginning of February this year, the news show that Samsung’s new HBM2 memory will integrate an AI engine. The HBM-PIM (Aquabolt-XL) chip can provide up to 1.2 TFLOPS of embedded computing capabilities, allowing the memory chip itself to perform CPU, GPU, ASIC, or FPGA operations. Recently, Samsung introduced a larger future development plan on Hot Chips 33, which will expand PIM (processing-in-memory) technology to DDR4, DDR5, LPDDR5X, GDDR6, and HBM3 memory.

Samsung added the HBM-PIM chip directly to the stack, which can be used in conjunction with the JEDEC standard HBM2 memory controller. Samsung used Xilinx Alveo FPGA for testing. This AI accelerator has a 2.5 times performance improvement and energy consumption has been reduced by more than 60%. Samsung said that the current PIM technology is compatible with any standard memory controller, but if the CPU manufacturer can provide optimization, it will bring greater performance improvements in some cases.

At present, Samsung has tested HBM2-PIM with a CPU manufacturer for use in future products. Since Intel Sapphire Rapids, AMD Genoa, Arm Neoverse platforms all support HBM memory, the future application range will be quite large.

Obviously, PIM technology is very suitable for data center use, but Samsung further shifts the technology to more standard memory. This time Samsung demonstrated AXDIMM memory, which is an optimization of existing memory. The AI ​​engine is integrated into the buffer chip to minimize a large amount of data exchange between CPU and DRAM to improve efficiency. AXDIMM memory can be inserted into a standard DDR4 server memory slot to be used without major changes to the existing hardware architecture.

Samsung said that some manufacturers have used AXDIMM memory to test on customers’ servers. In specific AI applications, its performance has increased by 1.8 times, while power consumption has been reduced by more than 40%. Since the memory is about to iterate to DDR5, I believe Samsung will follow up soon, and it is expected that this technology will be introduced to the market in the near future.

Samsung’s PIM technology is not limited to relatively high-end areas, but can also be shifted to common products close to consumers, such as laptops, tablets, and even mobile phones. Samsung has planned to introduce PIM technology into LPDDR5 memory, but it is still in the early stages. In the simulation test of LPDDR5-PIM memory (6400 MHz), the performance of its speech recognition workload was increased by 2.3 times, the translation performance based on the converter was increased by 1.8 times, and the GPT-2 text generation ability was increased by 2.4 times. Consumption has been reduced by more than 60%.


Although Samsung’s PIM technology is progressing rapidly and can be used with standard memory controllers, it has not been certified by JEDEC. This is the main obstacle to the widespread use of PIM technology. Samsung hopes that the PIM specification can be standardized to further expand the portfolio of memory products, such as adding it to the HBM3 standard specification. At present, JEDEC has not officially released the HBM3 standard specification, it is still under development.Samsung said that it will move forward from FP16 of HBM2 to FP64 of HBM3, which means that the chip will have expanded functions. FP16 and FP32 will be reserved for data center use, while INT8 and INT16 will serve LPDDR5, DDR5, and GDDR6 memory to better adapt to different application scenarios.

Samsung will also bring PIM technology to more different types of memory, such as GDDR6, which can broaden its application range. Samsung released the industry’s first CXL memory module this year, which is a new storage product based on the Compute Express Link standard and may also introduce PIM technology in the future. With the rise of AI technology, PIM technology may change the existing rules of the game more than ordinary people seem to be. Perhaps in the future, it is possible for GPU memory to assist in processing some workloads to improve performance and reduce power consumption.