Uber open source Manifold, a Visual Debugging Tool for Machine Learning

Debugging machine learning models is not easy, and Uber says that people typically spend 20% of their energy on initial working models and 80% of their energy on improving model performance. To this end, Uber developed and open-sourced Manifold, a visual debugging tool for machine learning that can be used to diagnose and debug problems in machine learning models.

Manifold can explain the underlying causes of poor model performance by showing differences in feature distributions between better and worse subsets of the data. And, it can show what different prediction accuracy several candidate models have for each data subset, thus providing a basis for advanced processing such as model integration.

Various features were added to the Manifold, including:

  • Model-agnostic support for general binary classification and regression model debugging. Users will be able to analyze and compare models of various algorithm types, enabling them to discern performance differences with regards to diverse data slices.
  • Visualization support for tabular feature input including numerical, categorical, and geospatial feature types. Using the feature value distribution information of each data slice, users can better understand the potential cause for certain performance issues, for instance, if there’s any correlation between the model’s prediction loss and the geo-location and distribution of its data points.
  • Integration with Jupyter Notebook. Through this integration, Manifold accepts data input as Pandas DataFrame objects and renders a visualization of this data within Jupyter. Since Jupyter Notebook is one of the most widely adopted data science platforms for data scientists and ML engineers, this integration enables users to analyze their models without breaking their normal workflows.
  • Interactive data slicing and performance comparisons based on per-instance prediction loss and other feature values. Users will be able to slice and query data based on prediction loss, ground truth, or other features of interest. This functionality will enable users to quickly validate or reject their hypothesis through versatile data slicing logic.