Monday, August 16, 2010

Implementing Multimaster Replication

Today I tried to implement multimaster replication on two Red Hat Enterprise Linux 5 servers.
I studied the top three google result for 'multimaster replication mysql' today :

I found these three resources complementary. At first I have no idea about the replication process. Initially followed instructions from Dev.MySQL site to setup one-way replication, then read the livejournal blog for better one-page summary of the process. Reading the ONLAMP Article, I understood the syntax to stop replication and ordered MySQL to skip one SQL statement when resuming replication.

I'm using replication with existing data in the database, so the steps I use is a little different from the three web resource above. And because I'm using out-of-the box RHEL setup, there is an issue with the firewall blocking port 3306.

Unblock the firewall

as root, edit (use vi or nano or something) /etc/sysconfig/iptables
> vi /etc/sysconfig/iptables

-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3306 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited

insert the line in red above.

Other steps is similar to what you might read in these three articles
  1. Add server-id, log-bin, sync_binlog, innodb_flush_at_trx_commit to /etc/my.cnf at server A
    • log-bin=mysql-bin
    • server-id=1
    • sync_binlog=1
    • innodb_flush_log_at_trx_commit=1

  2. service mysqld restart (at A)
  3. create repl user at A, grant it replication capabilities : (I omit the domain so I wouldn't be debugging hostname problems, it is safe for my purpose because it is an internal server)
    mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'slavepass';
    mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

  4. run mysql (client) in one terminal session : mysql> FLUSH TABLES WITH READ LOCK;
  5. in other terminal session, check latest binary log sequence number with mysql> SHOW MASTER STATUS;
  6. mysqldump --all-databases --lock-all-tables >dbdump.db
  7. back to first mysql session, unlock tables :
    mysql> UNLOCK TABLES;
  8. prepare server B's /etc/my.cnf with full blown options :
    • log-bin=mysql-bin
    • server-id=2
    • sync_binlog=1
    • innodb_flush_log_at_trx_commit=1
    • auto_increment_increment=2
    • auto_increment_offset=2
    • log-slave-updates

    • master-host = ipofserverA
    • master-user = repl
    • master-password = passwordOfRepl

    • log-slow-queries=/var/log/mysql-slow.log
    • long_query_time=1
    • log-queries-not-using-indexes=1

  9. start server B's mysql with --skip-slave-start option :
    /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/ --user=mysql --skip-slave-start
  10. import database dump at B:
    mysql <>
  11. change master on B to refer the A server :
    -> MASTER_HOST='master_host_name',
    -> MASTER_USER='replication_user_name',
    -> MASTER_PASSWORD='replication_password',
    -> MASTER_LOG_FILE='recorded_log_file_name',
    -> MASTER_LOG_POS=recorded_log_position;
  12. mysql> START SLAVE;
  13. change A's my.cnf file to resemble B's. Substitute a different server-id and auto increment offset.
  14. service mysqld restart
    (on A:)
After these steps, the system choked on first replicated statement from server B. The solution is to stop slave on A and do
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; before starting A's slave again.

Saturday, August 14, 2010

Feedback Cycles - key to timely process delivery

I've recently found some analogy in some events occured lately. Its all about feedbacks. Back then when I'm doing undergraduate study in Electronics Engineering, we learnt that feedback is an important element of any control systems. And now, it seems, it still an important element, if we want our process delivers in timely fashion. Allow me to describe the two events where the analogy is found.
First, some person were being requested to upload some data. But they aren't the one to upload it to the system, they only must provide the datafile to the uploader. And where I fits in, uploading the data. Because of belief that its not in my best interest to upload stuffs, I delegated the task. After the delegate finished with the task, I notify some people that the task is done. Days passed. The software where the uploads being done have a report to validate the data, but somehow the report returns far too much invalid data. Takes us another days to found out that some of the uploaded data's columns were switched. And also that a part of the upload logic has incorrect mapping. I think, things will be a lot faster if we shorten the feedback cycle - like for instance, if the data provider is the one who uploads the data and also verifies it in system. Then I begin to think that maybe if I were to be the one that upload it things will be different, but.. maybe not. For sure, too many people in the process chain increases process latency. Too late feedback make process go awry.
Second, a few programmers codes some functionality I have been given them in form of task list. But I didn't review them, the code were being written with a lot performance problems and some mismatch with the actual need. I think, if we shorten the feedback cycle things could be better - for example, letting the programmers view the profile log so they know how many unnecessary queries being made to the database. Or letting the user review each task done in the application, so the functionality mismatch is minimal. Waiting the user to actually need to use the functionality and complain about something missing is somehow suboptimal. I begin to think that I must adopt Bill Atkinson's philoshopy, that its better to have few functionality done right than a lot functionality done all wrong !
The analogy could go on and on, for example, when we found out that the reconfiguration of the server to accomodate hosting of a second app actually disables the first app - about two months after the reconfiguration (!). One factor to this incident that I have no logon to try out in the first app so I didn't bother trying to log on to the application. We found out the error - at the expense of two months application downtime and the need to reinitiate server reconfiguration procedure. There is very much bueraucracy around every operational server in here, partly to prevent this stuff from happening, but also makes recovery cycle far too long. And I also wondering why there is no automated script to test the application's logon functionality?
So, feedback in control systems, its very much the same in business processes, and also in software engineering processes. Feedback cycles are important element, no matter in which context you see it.