Apache HBase 2.3.3 releases, distributed database

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Apache HBase

Features

  • Linear and modular scalability.
  • Strictly consistent reads and writes.
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers.
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
  • Easy to use Java API for client access.
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

Apache HBase 2.3.3 released.

Changelog

  • These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
  • **Increase the timeout value for nightly jobs**

    Increase timeout value for nightly jobs to 16 hours since the new build machines are dedicated to HBase project, so we are allowed to use it all the time.

  • **[HBCK2] Add RecoveredEditsPlayer**

    WALPlayer can replay the content of recovered.edits directories.

    Side-effect is that the WAL filename timestamp is now factored when setting start/end times for WALInputFormat; i.e. wal.start.time and wal.end.time values on a job context. Previous we looked at wal.end.time only. Now we consider wal.start.time too. If a file has a name outside of wal.start.time\<-\>wal.end.time, it’ll be by-passed. This change-in-behavior will make it easier on operator crafting timestamp filters processing WALs.

  • **Set java.io.tmpdir to project build directory to avoid writing std\*deferred files to /tmp**

    Change the java.io.tmpdir to project.build.directory in surefire-maven-plugin, to avoid writing std\*deferred files to /tmp which may blow up the /tmp disk on our jenkins build node.

  • **Add MR Counters to WALPlayer; currently hard to tell if it is doing anything**

    Adds a WALPlayer to MR Counter output:

    org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter
    CELLS\_READ=89574
    CELLS\_WRITTEN=89572
    DELETES=64
    PUTS=5305
    WALEDITS=4375

Download