Apache Kudu

Apache Kudu 1.6.0 release, Hadoop data storage system

Apache Kudu 1.6.0 Released

Apache Kudu is Open Source software. A Kudu cluster stores tables that look just like tables you’re used to from relational (SQL) databases. A table can be as simple as an binary keyand value, or as complex as a few hundred different strongly-typed attributes.

Just like SQL, every table has a PRIMARY KEY made up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a (host, metric, timestamp) tuple for a machine time series database. Rows can be efficiently read, updated, or deleted by their primary key.

Kudu’s simple data model makes it breeze to port legacy applications or build new ones: no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data.


New features

  • Tablet servers’ tolerance of disk failures is now enabled by default and has been extended to handle data directory failures at runtime. In the event of a disk failure at runtime, any tablets with data on a failed disk will be shut down and restarted on another tablet server. There is a configurable tradeoff between a newly added tablet’s tolerance to disk failures and its ability to parallelize reads via the experimental --fs_target_data_dirs_per_tablet flag. Tablets that are spread across fewer disks are less likely to be affected by a disk failure, at the cost of reduced parallelism. By default, tablets are striped across all available disks. Note that the first configured data directory and the WAL directory cannot currently tolerate disk failures. This will be further improved in future Kudu releases.
  • Kudu servers can now adopt new data directories via the new kudu fs update_dirs tool. The new directory will be used by new tablet replicas only. Note that removing directories is not yet supported (see KUDU-2202).
  • Kudu servers have two new flags to control webui TLS/HTTPS settings: --webserver_tls_ciphers and --webserver_tls_min_protocol. These flags allow the advertised TLS ciphers and TLS protocol versions to be configured. Additionally, the webserver now excludes insecure legacy ciphers by default (see KUDU-2190).

Optimizations and improvements

  • Kudu servers can now tolerate short interruptions in NTP clock synchronization. NTP synchronization is still required when any Kudu daemon starts up. If NTP synchronization is not available, diagnostic information is now logged to help pinpoint the issue (see KUDU-1578).
  • Tablet server startup time has been improved significantly on servers containing large numbers of blocks.
  • The log block manager now performs disk data deletion in batches. This optimization can significantly reduce the time taken to delete data on a tablet.
  • The usage of sensitive data redaction flag has been slightly changed. By setting --redact=log flag, redaction will be disabled in the web UI but retained for server logs. Alternatively, --redact=none can be used to disable redaction completely.
  • The Spark DataSource integration now can take advantage of scan locality for better scan performance, the scan will take place at the closest replica instead of going to the leader.



Leave a Reply

Your email address will not be published. Required fields are marked *