Apache Kudu 1.10.0 release, Hadoop data storage system

Apache Kudu

Apache Kudu is Open Source software. A Kudu cluster stores tables that look just like tables you’re used to from relational (SQL) databases. A table can be as simple as an binary keyand value, or as complex as a few hundred different strongly-typed attributes.

Just like SQL, every table has a PRIMARY KEY made up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a (host, metric, timestamp) tuple for a machine time series database. Rows can be efficiently read, updated, or deleted by their primary key.

Kudu’s simple data model makes it breeze to port legacy applications or build new ones: no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data.

Apache Kudu

Apache Kudu 1.10.0 released

Changelog:

New features

  • Kudu now supports both full and incremental table backups via a job implemented using Apache Spark. Additionally it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark. See the backup documentation for more details.
  • Kudu can now synchronize its internal catalog with the Apache Hive Metastore, automatically updating Hive Metastore table entries upon table creation, deletion, and alterations in Kudu. See the HMS synchronization documentation for more details.
  • Kudu now supports native fine-grained authorization via integration with Apache Sentry. Kudu may now enforce access control policies defined for Kudu tables and columns, as well as policies defined on Hive servers and databases that may store Kudu tables. See the authorization documentation for more details.
  • Kudu’s web UI now supports SPNEGO, a protocol for securing HTTP requests with Kerberos by passing negotiation through HTTP headers. To enable, set the --webserver_require_spnego command line flag.
  • Column comments can now be stored in Kudu tables, and can be updated using the AlterTable API (see KUDU-1711).
  • The Java scan token builder can now create multiple tokens per tablet. To use this functionality, call setSplitSizeBytes() to specify how many bytes of data each token should scan. The same API is also available in Kudu’s Spark integration, where it can be used to spawn multiple Spark tasks per scanned tablet (see KUDU-2670).
  • Experimental Kudu Docker images are now published on Docker Hub.
  • Kudu now has an experimental Kubernetes StatefulSet manifest and Helm chart, which can be used to define and provision Kudu clusters using Kubernetes (see KUDU-2398).
  • The Kudu CLI now has rudimentary YAML-based configuration file support, which can be used to provide cluster connection information via cluster name instead of keying in comma-separated lists of master addresses. See the cluster name documentation for more details.
  • kudu perf table_scan scans a table and displays a table’s row count as well as the time it took to run the scan.
  • kudu table copy copies data from one table to another, within the same cluster or across clusters. Note, this implementation leverages a single client, therefore it may not be suitable for large tables.
  • Tablet history retention time can now be configured on a table-by-table basis. (see KUDU-2514).

Download