Apache Kylin is an open-source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.
Apache Kylin lets you query massive datasets at sub-second latency in 3 steps.
- Identify a Star Schema on Hadoop.
- Build Cube from the identified tables.
- Query with ANSI-SQL and get results in sub-second, via ODBC, JDBC or RESTful API.
WHAT IS KYLIN?
– Extremely Fast OLAP Engine at Scale:
– ANSI SQL Interface on Hadoop:
– Interactive Query Capability:
– MOLAP Cube:
– Seamless Integration with BI Tools:
– Other Highlights:
– Compression and Encoding Support
– Incremental Refresh of Cubes
– Leverage HBase Coprocessor for query latency
– Both approximate and precise Query Capabilities for Distinct Count
– Approximate Top-N Query Capability
– Easy Web interface to manage, build, monitor and query cubes
– Security capability to set ACL at Cube/Project Level
– Support LDAP and SAML Integration
Apache Kylin v4.0.0-alpha was released.
This is a major release after 3.1.0, with 35 new features/improvements and 22 bug fixes.
- [KYLIN-4188] – Parquet as Cube storage V2
- [KYLIN-4213] – The new build engine with Spark-SQL
- [KYLIN-4452] – Kylin on Parquet with Docker
- [KYLIN-4462] – Support Count Distinct,TopN and Percentile by kylin on Parquet
- [KYLIN-4659] – Prepare a technical preview version for Parquet Storage
- [KYLIN-4449] – A running build job will still running when cancel from front end
- [KYLIN-4450] – Add the feature that adjusting spark driver memory adaptively
- [KYLIN-4456] – Temporary files generated by UT or Integration Tests need to be deleted
- [KYLIN-4458] – FilePruner prune shards
- [KYLIN-4459] – Continuous print warning log-DFSInputStream has been closed already
- [KYLIN-4467] – Support TopN by kylin on Parquet
- [KYLIN-4468] – Support Percentile by kylin on Parquet
- [KYLIN-4474] – Support window function for Kylin on Parquet
- [KYLIN-4475] – Support intersect count for Kylin on Parquet
- [KYLIN-4541] – Kylin.log output error information during build job
- [KYLIN-4542] – After downloading spark with bin/download-spark.sh , still need set SPARK_HOME manually .
- [KYLIN-4621] – Avoid annoying log message when build cube and query
- [KYLIN-4625] – Debug the code of Kylin on Parquet without hadoop environment
- [KYLIN-4631] – Set the default build engine type to spark for Kylin on Parquet
- [KYLIN-4644] – New tool to clean up intermediate files for Kylin 4.0
- [KYLIN-4680] – Avoid annoying log messages of unit test and integration test
- [KYLIN-4695] – Automatically start sparder (for query) application when start kylin instance.
- [KYLIN-4699] – Delete job_tmp path after build/merge successfully
- [KYLIN-4713] – Support use diff spark schedule pool for diff query
- [KYLIN-4722] – Add more statistics to the query results
- [KYLIN-4723] – Set the configurations about shard by to cube level
- [KYLIN-4744] – Add tracking URL for build spark job on yarn
- [KYLIN-4746] – Improve build performance by reducing the count of calling ‘count()’ function
- [KYLIN-4747] – Use the first dimension column as sort column within a partition
- [KYLIN-4444] – Error when refresh segment
- [KYLIN-4451] – ClassCastException when querying on cluster with binary package
- [KYLIN-4453] – Query on refreshed cube failed with FileNotFoundException
- [KYLIN-4454] – Query snapshot table failed
- [KYLIN-4455] – Query will fail when set calcite.debug=true
- [KYLIN-4457] – Query cube result doesn’t math with spark sql
- [KYLIN-4461] – When querying with measure whose return type is decimal, it will throw type cast exception
- [KYLIN-4465] – Will get direct parent and ancestor cuboids with method findDirectParentCandidates
- [KYLIN-4466] – Cannot unload table which is loaded from CSV source
- [KYLIN-4469] – Cannot clone model
- [KYLIN-4471] – Cannot query sql about left join
- [KYLIN-4482] – Too many logging segment info with CubeBuildJob step
- [KYLIN-4483] – Avoid to build global dictionaries with empty ColumnDesc collection
- [KYLIN-4632] – No such element exception:spark.driver.cores
- [KYLIN-4681] – Use KylinSession instead of SparkSession for some test cases
- [KYLIN-4694] – Fix ‘NoClassDefFoundError: Lcom/esotericsoftware/kryo/io/Output’ when query with sparder on yarn
- [KYLIN-4698] – Delete segment storage path after merging segment, deleting segment and droping cube
- [KYLIN-4721] – The default source source type should be CSV not Hive with the local debug mode
- [KYLIN-4732] – The cube size is wrong after disabling the cube
- [KYLIN-4733] – the cube size is inconsistent with the size of all segments
- [KYLIN-4734] – the duration is still increasing after discarding the job
- [KYLIN-4742] – NullPointerException when auto merge segments if exist discard jobs*