LinkedIn open source TonY, allowing Hadoop to natively support TensorFlow

The LinkedIn open source TensorFlow on YARN (TonY) project allows users to build YARN-based TensorFlow application solutions on a single node or sizeable Hadoop cluster. TonY works like MapReduce in Hadoop, which performs similar methods for Pig and Hive scripts, providing first-level support for TensorFlow tasks. TonY consists of three main components, the client, the ApplicationMaster, and the TaskExecutor. It contains four main functions of GPU scheduling, precise resource requests, TensorBoard support and fault tolerance. TonY is available on GitHub

With nearly 600 million members on the LinkedIn platform, with the development of deep learning technology, LinkedIn’s artificial intelligence engineers strive to apply artificial intelligence to many features, such as abstracts or replies, many of which were developed using Google. A deep learning framework, TensorFlow, was built. In the beginning, LinkedIn internal TensorFlow users were designed for use on small applications and unmanaged bare metal. But with later development, they are increasingly aware of the need to have TensorFlow connect and use the computing and storage resources on the Hadoop Big Data platform. LinkedIn’s Hadoop cluster has hundreds of petabytes of data and is ideal for developing deep learning applications.

 

In addition to performing basic decentralised TensorFlow on Hadoop, TonY also implements features that support large-scale training. TonY supports GPU scheduling, which can request GPU resources from a cluster using the Hadoop API. Also, it supports high-precision resource requests. Since TonY can request different entities as separate components, users can ask different resources for each entity type, i.e. the user can control the resources used by the application. It also helps cluster administrators avoid wasting hardware resources.