Tue. Feb 25th, 2020

Apache Kylin v3.0 released, Open source distributed analytics engine

4 min read

Apache Kylin is an open-source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.

Apache Kylin lets you query massive datasets at sub-second latency in 3 steps.

  1. Identify a Star Schema on Hadoop.
  2. Build Cube from the identified tables.
  3. Query with ANSI-SQL and get results in sub-second, via ODBC, JDBC or RESTful API.

WHAT IS KYLIN?

– Extremely Fast OLAP Engine at Scale: 

Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data

– ANSI SQL Interface on Hadoop: 

Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions

– Interactive Query Capability: 

Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset

– MOLAP Cube:

User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records

– Seamless Integration with BI Tools:

Kylin currently offers integration capability with BI Tools like Tableau, PowerBI, and Excel. Integration with Microstrategy is coming soon

– Other Highlights: 

– Job Management and Monitoring
– Compression and Encoding Support
– Incremental Refresh of Cubes
– Leverage HBase Coprocessor for query latency
– Both approximate and precise Query Capabilities for Distinct Count
– Approximate Top-N Query Capability
– Easy Web interface to manage, build, monitor and query cubes
– Security capability to set ACL at Cube/Project Level
– Support LDAP and SAML Integration

Apache Kylin v3.0 was released.

Changelog

New Feature

  • [KYLIN-4098] – Add cube auto merge api
  • [KYLIN-3883] – Kylin supports column count aggregation

Improvement

  • [KYLIN-565] – Unsupported SQL Functions
  • [KYLIN-1772] – Highlight segment at HBase tab page of cube admin view when the segment is not healthy.
  • [KYLIN-1850] – Show Kylin Version on GUI
  • [KYLIN-2431] – StorageCleanupJob will remove intermediate tables created by other kylin instances
  • [KYLIN-3756] – Support check-port-availability script for mac os x
  • [KYLIN-3865] – Centralize the zookeeper related info
  • [KYLIN-3906] – ExecutableManager is spelled as ExecutableManger
  • [KYLIN-3907] – Sort the cube list by create time in descending order.
  • [KYLIN-3917] – Add max segment merge span to cleanup intermediate data of cube building
  • [KYLIN-4010] – Auto adjust offset according to query server’s timezone for time derived column
  • [KYLIN-4096] – Make cube metadata validator rules configuable
  • [KYLIN-4097] – Throw exception when too many dict slice eviction in AppendTrieDictionary
  • [KYLIN-4163] – CreateFlatHiveTableStep has not yarn app url when hive job running
  • [KYLIN-4167] – Refactor streaming coordinator
  • [KYLIN-4175] – Support secondary hbase storage config for hbase cluster migration
  • [KYLIN-4178] – Job scheduler support safe mode
  • [KYLIN-4180] – Prevent abnormal CPU usage by limiting flat filters length
  • [KYLIN-4187] – Building dimension dictionary using spark
  • [KYLIN-4193] – More user-friendly page for loading streaming tables
  • [KYLIN-4198] – “bin/system-cube.sh cron” will overwrite user’s crontab
  • [KYLIN-4201] – Allow users to delete unused receivers from streaming page
  • [KYLIN-4208] – RT OLAP kylin.stream.node configure optimization support all receiver can have the same config
  • [KYLIN-4257] – Build historical data by layer in real time Lambda cube
  • [KYLIN-4258] – Real-time OLAP may return incorrect result for some case
  • [KYLIN-4273] – Make cube planner works for real-time streaming job
  • [KYLIN-4283] – FileNotFound error in “Garbage Collection” step should not break cube building.

Bug Fix

  • [KYLIN-1716] – leave executing query page action stop bug
  • [KYLIN-3730] – TableMetadataManager.reloadSourceTableQuietly is wrong
  • [KYLIN-3741] – when the sql result is empty and limit is 0 , should not have “load more” bar
  • [KYLIN-3842] – kylinProperties.js Unable to get the public configuration of the first line in the front end
  • [KYLIN-3881] – Calcite isolating expression with its condition may throw ‘Division Undefined’ exception
  • [KYLIN-3887] – Query with decimal sum measure of double complied failed after KYLIN-3703
  • [KYLIN-3933] – Currently replica set related operation need refresh current front-end page
  • [KYLIN-4135] – Real time streaming segment build task discard but can’t be rebuilt
  • [KYLIN-4147] – User has project’s admin permission but doesn’t have permission to see the Storage/Planner/streaming tab in Model page
  • [KYLIN-4162] – After drop the build task on the monitor page, subsequent segments cannot be constructed.
  • [KYLIN-4165] – RT OLAP building job on “Save Cube Dictionaries” step concurrency error
  • [KYLIN-4169] – Too many logs while DataModelManager init, cause the first RESTful API hang for a long time
  • [KYLIN-4172] – Can’t rename field when map streaming schema to table
  • [KYLIN-4176] – Filter the intermediate tables when loading table metadata from tree
  • [KYLIN-4183] – Clicking ‘Submit’ button is unresponsive, when the segment is not selected.
  • [KYLIN-4190] – hiveproducer write() function throw exception because hive mertics table location path prefix is different with defaut fs when hdfs uses router-based federation
  • [KYLIN-4194] – Throw KylinConfigCannotInitException at STEP “Extract Fact Table Distinct Columns” with spark
  • [KYLIN-4203] – Disable a real time cube and then enable it ,this cube may can’t submit build job anymore
  • [KYLIN-4229] – String index out of range -1
  • [KYLIN-4242] – Usage instructions in ‘PasswordPlaceholderConfigurer’ doesn’t work
  • [KYLIN-4244] – ClassNotFoundException while use org.apache.kylin.engine.mr.common.CubeStatsReader in bash
  • [KYLIN-4246] – Wrong results from real-time streaming when an optional field is used as a dimension
  • [KYLIN-4248] – When adding a user, the prompt message is incorrect when the user name is empty.
  • [KYLIN-4254] – The result exporting from Insight with CSV format is empty, when sql contains Chinese
  • [KYLIN-4262] – pid in GC filename inconsistent with real pid
  • [KYLIN-4265] – SQL tab of cube failed when filter is not empty

Download