Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data.
Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.
To see a complete list of what is supported, go to our language reference. All standard SQL query constructs are supported, including SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, etc. It also supports a full set of DML commands as well as table creation and versioned incremental alterations through our DDL commands.
Here’s a list of what is currently not supported:
- Relational operators. Intersect, Minus.
- Miscellaneous built-in functions. These are easy to add – read this blog for step by step instructions.
Apache Phoenix supports table creation and versioned incremental alterations through DDL commands. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
A Phoenix table is created through the CREATE TABLE command and can either be:
- built from scratch, in which case the HBase table and column families will be created automatically.
- mapped to an existing HBase table, by creating either a read-write TABLE or a read-only VIEW, with the caveat that the binary representation of the row key and key values must match that of the Phoenix data types (see Data Types reference for the detail on the binary representation).
- For a read-write TABLE, column families will be created automatically if they don’t already exist. An empty key value will be added to the first column family of each existing row to minimize the size of the projection for queries.
- For a read-only VIEW, all column families must already exist. The only change made to the HBase table will be the addition of the Phoenix coprocessors used for query processing. The primary use case for a VIEW is to transfer existing data into a Phoenix table since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE.
All schema is versioned (with up to 1000 versions being kept). Snapshot queries over older data will pick up and use the correct schema based on the time at which you’ve connected (based on the CurrentSCN property).
- Critical bug fixes to prevent snapshot creation of SYSTEM.CATALOG when
- Numerous bug fixes around handling of row deletion
- Improvements to statistics collection
- New COLLATION_KEY built-in function for linguistic sort