HBASE, Region Server and Related Issues - Eureka

Introduction and Problem

What is HBASE? What are Region Servers? How these look like as a service in Real Hadoop and big data Environment? What are the issues/Use Cases related to HBASE?

Let's start and take a look on HBASE, Region Servers and one of the Use Case/Issue related to it.

hbase-region-server

What is HBASE?

When we need real time random read/write access to our already existing Hadoop Data then we need Apache HBASE.

This data can be migrated from anyone of the already existing technologies like Sqoop etc. before becoming a Use Case for HBASE.

Moreover through HBASE (Hadoop Database) we can store very huge tables i.e.more than billions of rows and millions of columns which is where it has edge from other relational databases apart from the real time data retrieval.

What are RegionServers?

RegionServers are the daemons used to store and retrieve hadoop data in HBase, Simple?

In Hadoop production/QA/Test environments each RegionServers are deployed on its own dedicated compute node.

Once we start using HBase we create a table just like SQL/Hive (syntax difference) and then begin storing and retrieving our data.

Once these HBASE tables grows beyond a threshold limit HBase system automatically starts splitting the table and distributing the data load to other RegionServers.

Above process is called as auto-sharding wherein HBase automatically scales as we add more data to the system which is a huge benefit compared to most DBMS which require our manual intervention to scale overall system beyond a single server.

Also scaling is automatic as long as we have in the rack another spare server that’s configured.

Additionally we should not set limit on tables and then split them as HDFS is the underlying storage mechanism hence all available disks in the HDFS cluster are available for storing our tables. (replication factor not counted here)

We should not limit ourselves to just one RegionServer to manage our tables when we have an entire cluster at our disposal. :-)

Below are few screenshots how HBASE and Region Server Services looks like in any given monitoring tool (shown for Cloudera Manager):-

HBASE Service in Cloudera Manager:-

hbase-region-server

Region Server Services in Cloudera Manager:-

hbase-region-server

Now when we know what is HBASE, Region Servers and how do they look like as a service in a real Hadoop Cluster environment let’s talk about a Use Case, an issue related with Region Server wherein there are many Region Servers which are down in our Hadoop Cluster, we have tried to restart single region server which failed and then tried to restart HBASE service which also failed. What to do in this tough situation? How to bring HBASE service and Region Server back to normal condition. We will talk about this issue/scenario in detail below:-

Issue:- Region servers down and not coming up even after restart.

Thought Process and Implementation Steps:- This kind of issue can come due to many issues and root cause can be different. Today we will talk about a specific root cause which I have faced quite few times and can help you guys also one or the other day.

Actually there is a location where HBASE puts its backup files and many times either its fully utilized or one of the content of that directory is corrupted. That location is:-

“/hbase/WALs” or “/hbase/OLDWALs”

We need to check Hbase logs at “/var/logs/hbase/” path location and in this specific scenario error logs should be related to error:-

“Failed to archive/delete all the files for /hbase/WALs/region:ph_sears_prcm_curr_future” OR “Failed to remove file for /hbase/WALs/region:ph_sears_prcm_curr_future”

If the error looks something like above then follow below steps to resolve the issue:-

  1. Stop all roles for affected node.
  2. Move content of hdfs location “/hbase/WALs/” according to the failed node to backup location e.g. /user/hdfs.
    e.g. if issue is related to host “trphsw4-11.hadoop.searshc.com” then move file:- “/hbase/WALs/trphsw4-11.hadoop.searshc.com,60020,1475072849247” to backup location “user/hdfs”
  3. Restart Region server of affected node.
  4. Region Server role should be now successfully up and running.
  5. Start other roles for affected node accordingly.
  6. Monitor logs for further investigation accordingly.

Backup Implementation Plan:-

If even after performing above steps Region Server is not get started then perform below steps:-

  1. Move the file from Backup directory “/user/hdfs” to “/hbase/WALs” directory.
  2. File should be same as taken Backup according to the Implementation Plan.
  3. Monitor logs for further investigation accordingly.

This blog written by team of senior big data and Hadoop developers from nexsoftsys.com, you can hire our developer to take the most advanced business development solutions to discover new possibility from your data. we provide Full-Fledged Custom Big Data Development & Consulting Services.


We will talk about other HBASE and Region Server Issues in another BLOG, Happy Reading!

Read More:

  1. Top Emerging Big Data Startups To Consider In 2018
  2. Top 7 Big Data Technologies Today