Saturday, December 20, 2014

Encountering Zookeeper

I have tried several noSQL databases, but yet to see any in a production environment. Well, except the MongoDB that were the part of Openshift Origin cluster we installed in our data center. Last week events make me interacts with Apache Zookeper, which is hidden inside three EMC VIPR controller nodes.

Basic Facts

Apache Zookeeper have the following characteristics :
- in memory database system
- data is modeled as a tree, like a filesystem
- build using java programming language
- usually runs as a cluster of minimal 3 hosts
- usually listens on port 2181

The zookeeper cluster (called ensemble) are supposed to be resilient to failure. As an in memory database, it needs memory larger than the entire data tree.

Any changes to the database are strictly ordered, coordinated between all nodes in the ensemble. For each time there must be a leader, and all other hosts will became followers.

Checking a Zookeeper

Do a telnet to port 2181, and issue a 'ruok' command. type ruok , a healthy zookeeper will reply with 'imok'. Refer to Zookeeper Admin, the 4 letter commands that recognized by zookeper with version below 3.3 are :
'stat' : print server statistics. summary of the server and connected clients
'dump' : list sessions on nodes, only works in the leader
'envi' : print details of the running environment
'srst' :  reset server statisticas
'ruok' : check that server is running in non-error state


We are recently hit with ZOOKEEPER-1573, which is a zookeper unable to load its database because there is an operation that refers to a child of a data node that doesn't exist. The cause seems to be that the zookeeper snapshots are 'fuzzy', they are written while the tree is updating, and there are parts of the transaction logs that being redo is already done and other parts are not done. The fix seem to be either to update zookeeper version so that such operation will be ignored, or to delete the problematic database and rely on other host's database to get synchronized.

No comments: