Presto 0.190 release, Facebook big data query engine
Presto 0.190 has been released, Presto is Facebook open source data query engine, can be more than 250PB data for rapid interactive analysis, the query reached the level of the commercial data warehouse. The performance of the engine is said to be more than 10 times that of Hive.
Presto can query including Hive, Cassandra, and even some commercial data storage products. A single Presto query can merge data from multiple data sources for unified analysis.
General Changes
- Fix correctness issue for
array_min()
andarray_max()
when arrays containNaN
. - Fix planning failure for queries involving
GROUPING
that require implicit coercions in expressions containing aggregate functions. - Fix potential workload imbalance when using topology-aware scheduling.
- Fix performance regression for queries containing
DISTINCT
aggregates over the same column. - Fix a memory leak that occurs on workers.
- Improve error handling when a
HAVING
clause contains window functions. - Avoid unnecessary data redistribution when writing when the target table has the same partition property as the data being written.
- Ignore case when sorting the output of
SHOW FUNCTIONS
. - Improve rendering of the
BingTile
type. - The
approx_distinct()
function now supports a standard error in the range of[0.0040625, 0.26000]
. - Add support for
ORDER BY
in aggregation functions. - Add dictionary processing for joins which can improve join performance up to 50%. This optimization can be disabled using the
dictionary-processing-joins-enabled
config property or thedictionary_processing_join
session property. - Add support for casting to
INTERVAL
types. - Add
ST_Buffer()
geospatial function. - Allow treating decimal literals as values of the
DECIMAL
type rather thanDOUBLE
. This behavior can be enabled by setting theparse-decimal-literals-as-double
config property or theparse_decimal_literals_as_double
session property tofalse
. - Add JMX counter to track the number of submitted queries.
Resource Groups Changes
- Add priority column to the DB resource group selectors.
- Add exact match source selector to the DB resource group selectors.
CLI Changes
- Add support for setting client tags.
JDBC Driver Changes
- Add
getPeakMemoryBytes()
toQueryStats
.
Accumulo Changes
- Improve table scan parallelism.
Hive Changes
- Fix query failures for the file-based metastore implementation when partition column values contain a colon.
- Improve performance for writing to bucketed tables when the data being written is already partitioned appropriately (e.g., the output is from a bucketed join).
- Add config property
hive.max-outstanding-splits-size
for the maximum amount of memory used to buffer splits for a single table scan. Additionally, the default value is substantially higher than the previous hard-coded limit, which can prevent certain queries from failing.
Thrift Connector Changes
- Make Thrift retry configurable.
- Add JMX counters for Thrift requests.
SPI Changes
- Remove the
RecordSink
interface, which was difficult to use correctly and had no advantages over thePageSink
interface.
Download