The R language is most commonly used for data analysis tools and statistical applications. To provide additional support for the R programming language on the Google Cloud Platform (GCP), Google announced the release of the Spark beta on Cloud Dataproc. According to Google, the rise of cloud computing has opened up new opportunities for R.
“Using GCP for R lets you avoid the infrastructure barriers that used to impose limits on understanding your data, such as choosing which datasets to sample because of compute or data size limits. With GCP, you can build large-scale models to analyze datasets of sizes that previously would have required huge upfront investments in high-performance computing infrastructures,” machine learning expert Mikhail Chrestkha in a blog post.
Cloud Dataproc is a managed cloud service for Apache Spark and Apache Hadoop clusters on GCP, and SparkR is a lightweight package that implements Apache Spark from R on the front end, the company explained.
Crosbie and Chrestkha wrote, “This integration lets R developers use dplyr-like operations on datasets of nearly any size stored in Cloud Storage. SparkR also supports distributed machine learning using MLlib. You can use this integration to process against large Cloud Storage datasets or perform computationally intensive work.”