Apache HBase 2.3.3 releases, distributed database
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Features
- Linear and modular scalability.
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
- Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
Apache HBase 2.3.3 released.
Changelog
- These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
- **Increase the timeout value for nightly jobs**
Increase timeout value for nightly jobs to 16 hours since the new build machines are dedicated to HBase project, so we are allowed to use it all the time.
- **[HBCK2] Add RecoveredEditsPlayer**
WALPlayer can replay the content of recovered.edits directories.
Side-effect is that the WAL filename timestamp is now factored when setting start/end times for WALInputFormat; i.e. wal.start.time and wal.end.time values on a job context. Previous we looked at wal.end.time only. Now we consider wal.start.time too. If a file has a name outside of wal.start.time\<-\>wal.end.time, it’ll be by-passed. This change-in-behavior will make it easier on operator crafting timestamp filters processing WALs.
- **Set java.io.tmpdir to project build directory to avoid writing std\*deferred files to /tmp**
Change the java.io.tmpdir to project.build.directory in surefire-maven-plugin, to avoid writing std\*deferred files to /tmp which may blow up the /tmp disk on our jenkins build node.
- **Add MR Counters to WALPlayer; currently hard to tell if it is doing anything**
Adds a WALPlayer to MR Counter output:
org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter
CELLS\_READ=89574
CELLS\_WRITTEN=89572
DELETES=64
PUTS=5305
WALEDITS=4375