About Hypertable
Hypertable system includes three components: Hyperspace, Master and Range Server. Hyperspace is a lock service, akin to Google’s Chubby, mainly used for synchronization and testing whether there’s node failure and storing the top-level location information. Master is used to complete task allocation, future load balancing and post-disaster reconstruction (Automatically recover services after Range Server fails), and other functions. Range Server is the actual workers of Hypertable, primarily responsible for providing services for the data in a Range. Moreover, it shoulders the responsibility of reconstruction, ie replaying the local log to restore the former state before its own fault. Additionally, it accesses Hypertable client and other components.
Introduction
Both Hypertable and HBase are scalable open source database products, and their design blueprint based on Google BigTable. The main difference is that Hypertable relies on C++ language, and HBase is written based on Java. The test environment is 16 servers which are connected through Gigabit Network.
Test Environment:
OS: CentOS 6.1
CPU: 2X AMD C32 Six Core Model 4170 HE 2.1 Ghz
RAM: 24GB 1333MHz DDR3
Disk: 4X 2TB SATA Western Digital RE4-GP WD2002FYPS
The NameNode running of Hypertable and HBase is on No.1 test machine, while DataNodes is running on No.4 to No.5 test machine. Meanwhile, RangeServer and RegionServers run on the same set of computers and are configured to use all memory resources. Three Zookeeper and Hyperspace copies run on the No.1 and No.3 test machines. In this test, the table is configured to use Snappy compression, as well as use Bloom filters to load Row Key.
Random Write Test
In the random write test, Hypertable and HBase test writing four different 5TB of data, using the values 10000, 1000, 100 and 10, respectively. At the same item, the key is fixed at 20 bytes and format the random integer into zero fill.
The following chart shows the test results:
The detailed performance test results:
The HBase throws an exception in the key test of 41 billion and 167 billion due to HBase RegionServers concurrent mode failure. No matter how to configure, when the speed that RegionServer produces useless data is faster than the Java garbage collection, the failure above will occur. Creating new garbage collection plan to solve the problem; however, it will take a heavy price for the run-time performance.
Matthew Hertz and Emery D. Berger published “Garbage Collection vs. Explicit Memory Management” at OOPSLA Conference in 2005, which provided a solid faith.
Random Read Test
The test mainly uses a set of random read request test to query throughput. Each system runs two tests, one to test Zipfian distribution, another to uniform distribution. The inserted key/value are fixed size, key to use fixed 20 bytes, and value to use fixed 1KB. The keys range from the integer in ASCII. Each query test returns a pair of keys. Run two tests on each system separately, one to load 5TB data and another to 0.5TB, which makes the experiment to be able to measure the performance of system memory to disk. 4,901,960,784 keys are loaded in 5TB test while 490,196,078 keys in 0.5TB test. The test client runs 128 processes (for a total of 512 process), and keep the maximum 512 queries in the whole testing process at the same time. This means each test issues 100 million queries.
Zipfian Distribution Environment Test
Configure Hypertable query cache to 2GB, and use the default value of block cache and memstore of HBase to keep good performance of HBase. See the following figure:
The detailed performance test results:
The main reason to lead to the difference is that Hypertable provides query cache and HBase can realize query cache as well, but Hypertable is subsystem of HBase. The subsystem generates a lot of garbage. Although it will improve the performance of HBase, it also brings some disadvantages, especially in ultra-large-scale write and large cell calculation of mixed workloads.
Uniform Distribution Test Environment
See the following figure:
The detailed performance test results:
The performance of HBase is close to Hypertable in the uniform distribution test, which should be due to disk IO bottleneck. Some garbage is also produced during the test.
Conclusion
In the past five years, Hypertable community has been working to perfect products. They aim at building Hypertable as a large data field of high-performance, high scalable database solution.