Apache Kudu 1.6.0 Released
Apache Kudu is Open Source software. A Kudu cluster stores tables that look just like tables you’re used to from relational (SQL) databases. A table can be as simple as an binary
value, or as complex as a few hundred different strongly-typed attributes.
Just like SQL, every table has a
PRIMARY KEYmade up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a
(host, metric, timestamp)tuple for a machine time series database. Rows can be efficiently read, updated, or deleted by their primary key.
Kudu’s simple data model makes it breeze to port legacy applications or build new ones: no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data.
- Tablet servers’ tolerance of disk failures is now enabled by default and has been extended to handle data directory failures at runtime. In the event of a disk failure at runtime, any tablets with data on a failed disk will be shut down and restarted on another tablet server. There is a configurable tradeoff between a newly added tablet’s tolerance to disk failures and its ability to parallelize reads via the experimental
--fs_target_data_dirs_per_tabletflag. Tablets that are spread across fewer disks are less likely to be affected by a disk failure, at the cost of reduced parallelism. By default, tablets are striped across all available disks. Note that the first configured data directory and the WAL directory cannot currently tolerate disk failures. This will be further improved in future Kudu releases.
- Kudu servers can now adopt new data directories via the new
kudu fs update_dirstool. The new directory will be used by new tablet replicas only. Note that removing directories is not yet supported (see KUDU-2202).
- Kudu servers have two new flags to control webui TLS/HTTPS settings:
--webserver_tls_min_protocol. These flags allow the advertised TLS ciphers and TLS protocol versions to be configured. Additionally, the webserver now excludes insecure legacy ciphers by default (see KUDU-2190).
- Kudu servers can now tolerate short interruptions in NTP clock synchronization. NTP synchronization is still required when any Kudu daemon starts up. If NTP synchronization is not available, diagnostic information is now logged to help pinpoint the issue (see KUDU-1578).
- Tablet server startup time has been improved significantly on servers containing large numbers of blocks.
- The log block manager now performs disk data deletion in batches. This optimization can significantly reduce the time taken to delete data on a tablet.
- The usage of sensitive data redaction flag has been slightly changed. By setting
--redact=logflag, redaction will be disabled in the web UI but retained for server logs. Alternatively,
--redact=nonecan be used to disable redaction completely.
- The Spark DataSource integration now can take advantage of scan locality for better scan performance, the scan will take place at the closest replica instead of going to the leader.