Sun. Jun 7th, 2020

Apache Kylin v3.0.2 released, Open source distributed analytics engine

3 min read

Apache Kylin is an open-source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.

Apache Kylin lets you query massive datasets at sub-second latency in 3 steps.

  1. Identify a Star Schema on Hadoop.
  2. Build Cube from the identified tables.
  3. Query with ANSI-SQL and get results in sub-second, via ODBC, JDBC or RESTful API.

WHAT IS KYLIN?

– Extremely Fast OLAP Engine at Scale: 

Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data

– ANSI SQL Interface on Hadoop: 

Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions

– Interactive Query Capability: 

Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset

– MOLAP Cube:

User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records

– Seamless Integration with BI Tools:

Kylin currently offers integration capability with BI Tools like Tableau, PowerBI, and Excel. Integration with Microstrategy is coming soon

– Other Highlights: 

– Job Management and Monitoring
– Compression and Encoding Support
– Incremental Refresh of Cubes
– Leverage HBase Coprocessor for query latency
– Both approximate and precise Query Capabilities for Distinct Count
– Approximate Top-N Query Capability
– Easy Web interface to manage, build, monitor and query cubes
– Security capability to set ACL at Cube/Project Level
– Support LDAP and SAML Integration

Apache Kylin v3.0.2 was released.

Changelog

Improvement

  • [KYLIN-3628] – Query with lookup table always use latest snapshot
  • [KYLIN-4132] – Kylin needn’t use “org.apache.directory.api.util.Strings” to import api-util.jar
  • [KYLIN-4388] – Refine the Dockerfile
  • [KYLIN-4390] – Update tomcat to 7.0.100
  • [KYLIN-4400] – Use beeline as hive client in system-cube.sh
  • [KYLIN-4437] – Should replace deprecated “mapred.job.name”

Bug Fix

  • [KYLIN-4119] – The admin of project can’t operate the action of Hybrids
  • [KYLIN-4206] – Build kylin on EMR 5.23. The kylin version is 2.6.4. When building the cube, the hive table cannot be found
  • [KYLIN-4340] – Fix bug of get value of isSparkFactDistinctEnable for cube not correct
  • [KYLIN-4353] – Realtime Segment is not closed in expected duration
  • [KYLIN-4354] – Prune segment not using given filter when using jdbc preparestatement
  • [KYLIN-4370] – Spark job failing with JDBC source on 8th step with error : org.apache.kylin.engine.spark.SparkCubingByLayer. Root cause: Table or view not found: default.`kylin_intermediate table’
  • [KYLIN-4372] – Docker entrypoint delete file too later cause ZK started by HBase crash
  • [KYLIN-4379] – Calculate column cardinality cannot use kylin config overwrite cause job failed
  • [KYLIN-4383] – Kylin Integrated Issue with Amazon EMR and AWS Glue in HiveMetaStoreClientFactory.java
  • [KYLIN-4385] – KYLIN system cube failing to update table when run on EMR with S3 as storage and EMRFS
  • [KYLIN-4396] – File Descriptor Leakage in MR Build Engine
  • [KYLIN-4397] – Use newLinkedHashMap in AssignmentUtil.java
  • [KYLIN-4405] – Internal exception when trying to build cube whose modal has null PartitionDesc
  • [KYLIN-4425] – Refactor Diagnosis Tool
  • [KYLIN-4426] – Refine CliCommandExecutor
  • [KYLIN-4433] – When uhc step is turned on, Build Dimension Dictionary job cannot get correct configuration
  • [KYLIN-4438] – Null password may cause RuntimeException when starting up
  • [KYLIN-4470] – The user cannot log in kylin normally after being assigned to a group
  • [KYLIN-4481] – Project-level ACL lookups not working for non-admin SAML-federated users

Download