Apache Kudu 1.12 releases, Hadoop data storage system
Apache Kudu is Open Source software. A Kudu cluster stores tables that look just like tables you’re used to from relational (SQL) databases. A table can be as simple as an binary
key
andvalue
, or as complex as a few hundred different strongly-typed attributes.Just like SQL, every table has a
PRIMARY KEY
made up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a(host, metric, timestamp)
tuple for a machine time series database. Rows can be efficiently read, updated, or deleted by their primary key.Kudu’s simple data model makes it breeze to port legacy applications or build new ones: no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data.
Apache Kudu 1.12 released
Changelog:
New features
- Kudu now supports native fine-grained authorization via integration with Apache Ranger. Kudu may now enforce access control policies defined for Kudu tables and columns stored in Ranger. See the authorization documentation for more details.
- Kudu’s web UI now supports proxying via Apache Knox. Kudu may be deployed in a firewalled state behind a Knox Gateway which will forward HTTP requests and responses between clients and the Kudu web UI.
- Kudu’s web UI now supports HTTP keep-alive. Operations that access multiple URLs will now reuse a single HTTP connection, improving their performance.
- The
kudu tserver quiesce
tool is added to quiesce tablet servers. While a tablet server is quiescing, it will stop hosting tablet leaders and stop serving new scan requests. This can be used to orchestrate a rolling restart without stopping on-going Kudu workloads. - Introduced
auto
time source for HybridClock timestamps. With--time_source=auto
in AWS and GCE cloud environments, Kudu masters and tablet servers use the built-in NTP client synchronized with dedicated NTP servers available via host-only networks. With--time_source=auto
in environments other than AWS/GCE, Kudu masters and tablet servers rely on their local machine’s clock synchronized by NTP. The default setting for the HybridClock time source (--time_source=system
) is backward-compatible, requiring the local machine’s clock to be synchronized by the kernel’s NTP discipline. - The
kudu cluster rebalance
tool now supports moving replicas away from specific tablet servers by supplying the--ignored_tservers
and--move_replicas_from_ignored_tservers
arguments (see KUDU-2914 for more details). - The
kudu table create
tool is added to allow users to specify table creation options using JSON. - Kudu now supports DATE and VARCHAR data types. See the schema design documentation for more details.