September 22, 2020

Apache Kudu 1.12 releases, Hadoop data storage system

2 min read

Apache Kudu is Open Source software. A Kudu cluster stores tables that look just like tables you’re used to from relational (SQL) databases. A table can be as simple as an binary keyand value, or as complex as a few hundred different strongly-typed attributes.

Just like SQL, every table has a PRIMARY KEY made up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a (host, metric, timestamp) tuple for a machine time series database. Rows can be efficiently read, updated, or deleted by their primary key.

Kudu’s simple data model makes it breeze to port legacy applications or build new ones: no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data.

Apache Kudu

Apache Kudu 1.12 released


New features

  • Kudu now supports native fine-grained authorization via integration with Apache Ranger. Kudu may now enforce access control policies defined for Kudu tables and columns stored in Ranger. See the authorization documentation for more details.
  • Kudu’s web UI now supports proxying via Apache Knox. Kudu may be deployed in a firewalled state behind a Knox Gateway which will forward HTTP requests and responses between clients and the Kudu web UI.
  • Kudu’s web UI now supports HTTP keep-alive. Operations that access multiple URLs will now reuse a single HTTP connection, improving their performance.
  • The kudu tserver quiesce tool is added to quiesce tablet servers. While a tablet server is quiescing, it will stop hosting tablet leaders and stop serving new scan requests. This can be used to orchestrate a rolling restart without stopping on-going Kudu workloads.
  • Introduced auto time source for HybridClock timestamps. With --time_source=auto in AWS and GCE cloud environments, Kudu masters and tablet servers use the built-in NTP client synchronized with dedicated NTP servers available via host-only networks. With --time_source=auto in environments other than AWS/GCE, Kudu masters and tablet servers rely on their local machine’s clock synchronized by NTP. The default setting for the HybridClock time source (--time_source=system) is backward-compatible, requiring the local machine’s clock to be synchronized by the kernel’s NTP discipline.
  • The kudu cluster rebalance tool now supports moving replicas away from specific tablet servers by supplying the --ignored_tservers and --move_replicas_from_ignored_tservers arguments (see KUDU-2914 for more details).
  • The kudu table create tool is added to allow users to specify table creation options using JSON.
  • Kudu now supports DATE and VARCHAR data types. See the schema design documentation for more details.