NVIDIA partners with open source community to bring GPU acceleration to Spark 3.0
NVIDIA has announced a collaboration with the open-source community to bring end-to-end GPU acceleration to Apache Spark 3.0. Apache Spark 3.0 is an analysis engine for big data processing, which is currently used by more than 500,000 data scientists worldwide.
As planned, with the release of Spark 3.0 later in the spring, data scientists and machine learning engineers will for the first time be able to apply revolutionary GPU acceleration to ETL (extract, transform and load) data processing workloads that commonly use SQL database operations.
In addition, AI model training will be able to be processed on the same Spark cluster instead of running the workload as a separate process on a separate infrastructure. In this way, high-performance data analysis can be performed on the entire data science processing process, and tens or even thousands of terabytes of data involved in data training to model training can be accelerated without the need for local and cloud Spark applications.
Based on the strategic AI partnership with Nvidia, Adobe was one of the first companies to run Spark 3.0 preview on Databricks. Adobe has used GPU-accelerated data analysis technology in Adobe Experience Cloud for product development and provides support for various functions that advance digital business processes. And in the preliminary test, it has improved the performance by 7 times and saved 90% of the cost.