Intel unveiled the MLPerf results for its Habana Gaudi 2 accelerator

Recently, Intel unveiled the MLPerf results for its Habana Gaudi 2 accelerator, showcasing exceptional performance in advanced vision and language models. Three months hence, Intel has again disseminated fresh results from this test, suggesting that the Gaudi AI accelerator could feasibly challenge the NVIDIA H100 GPU, thus underscoring Intel’s competitive prowess in AI inference.

The data shared by Intel indicates a commendable enhancement in the Gaudi 2 accelerator’s performance, even surpassing its competitor, NVIDIA’s A100 GPU. Intel contends that, in relation to analogous products in the market, their Gaudi 2 accelerator is more economically priced. Given the retail price of the NVIDIA A100, their claim holds water.

The inference outcomes of the Gaudi 2 on the GPT-J model compellingly attest to its competitive caliber. Compared to NVIDIA’s H100, the performance of Gaudi 2 is noteworthy. The H100 boasts only a 1.09 times advantage in server performance and a 1.28 times advantage in offline capability against the Gaudi 2. When juxtaposed with the A100, the Gaudi 2 outperforms by 2.4 times in server capacity and 2 times in offline capability. The results submitted by Gaudi 2 employ the FP8 data type, achieving a striking 99.9% accuracy rate on this novel data type.

With software updates for Gaudi 2 rolled out every 6-8 weeks, Intel is poised to continue demonstrating performance augmentation in MLPerf benchmark tests, alongside an ever-expanding model coverage range.

Additionally, Intel disclosed test outcomes for its fourth-generation Xeon Scalable processor and the Xeon CPU Max. These encompassed the GPT-J model. The results divulged outstanding performance in visual, linguistic processing, audio, and voice translation models, alongside an augmented DLRM v2 deep learning recommendation model and the ChatGPT-J model by the fourth-generation Xeon processor for general AI workloads.

The fourth-generation Xeon Scalable processor emerges as the quintessential choice for constructing and deploying general AI workloads through prevalent AI frameworks and libraries. Tasked with summarizing a 1000-1500 word press release into a succinct 100 words, this processor can craft two summaries per second in offline mode, and one per second in real-time server mode.

For the first time, Intel submitted MLPerf results for the Intel Xeon CPU Max series, offering a memory bandwidth of up to 64GB. As for the GPT-J, it stands as the solitary CPU capable of achieving an accuracy of 99.9%, which is paramount for applications demanding exacting precision.

In collaboration with OEM manufacturers, Intel proffered these results, further elucidating the scalability of its AI performance, and the accessibility of general servers based on Xeon processors, adeptly fulfilling customer Service Level Agreements (SLAs).

Sandra Rivera, Intel’s Executive Vice President and General Manager of the Data Center and AI Division, articulated, “As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers’ needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models – from the smallest to the largest – with leading price/performance.”