Friday, October 12, 2007

Making of Heartbeat2 Resources: Is There Something Wrong?

Linux HA project is a rather powerful and useful high availability solution, yet can be painful too, especially for those lost in documentations. We have many stories to tell, but this one amuses me most, by far.

In short, Heartbeat2 is capable of monitoring individual resources in addition to cluster nodes through Cluster Resource Manager (CRM). When a failed resource is detected, it will try to restart the resource on the same node. Our codes haven't been evil enough to testify the following statement from FAQ though: "The node will try to restart the resource, but if this fails, it will fail over to an other node. A feature that allows failover after N failures in a given period of time is planned."

To turn on the CRM resource monitor feature:
  1. Modify /etc/ha.d/ha.cf to include the line crm yes

  2. Configure resources to be managed in /etc/ha.d/haresources. Though no longer used in Heartbeat2, we found it easier to just go through the conversion path.

  3. Clean any old CRM configurations:
    rm -f /var/lib/heartbeat/crm/cib.xml*

  4. Generate the fresh configuration file /var/lib/heartbeat/crm/cib.xml:
    python /usr/lib64/heartbeat/haresources2cib.py

  5. Start Heartbeat daemon:
    /etc/init.d/heartbeat start

If some resource can not be started correctly, likely that its corresponding Init script is not LSB Compliant. Just test and fix accordingly. For example, default httpd and mysqld that come with RHEL4 distributions are not LSB compliant. A conservative approach is to make a copy from /etc/init.d/ to /etc/ha.d/resource.d/, and modify from there.

Once all heartbeat managed resources have been started and running correctly, failover scenarios can be simulated and tested through forced nodes takeover, using crm_standby.

To modify anything in /var/lib/heartbeat/crm/cib.xml, using cibadmin is required. For example:

cibadmin -U -o resources -X '<op id="test_8_mon" interval="1s" name="monitor" timeout="2s"/>'

changes the interval and timeout values for monitoring our test resource.

Still not too bad? Well, it's not finished yet if you start to notice this message in /var/log/messages: (BTW, "ouput" is NOT my typo.:)

WARN: There is something wrong: the first line isn't read in. Maybe the heartbeat does not ouput string correctly for status operation. Or the code (myself) is wrong.

It may seem not so harmful when predefined resources and services are all up and running, and can migrate between nodes without any problem. However, we experienced mysterious hearbeat behaviors from time to time when such messages were flooding /var/log/messages. For example, when our failed test resource was detected, but could not be restarted, heartbeat started to just completely ignore the resource. No further actions were taken. This may not be related, but it hasn't occurred since the above error was eliminated.

So where is "something wrong"? The Resource Agent. Examining /var/lib/heartbeat/crm/cib.xml closely, we noticed that resources like httpd, mysqld, and our test resource are all defined as Heartbeat Resource Agents, which "are basically LSB init scripts - with slightly odd status operations". Odd where?
The status operation has to really report status correctly, AND, it has to print either OK or running when the resource is active, and it CANNOT print either of those when it's inactive. For the status operation, we ignore the return code.

This sounds quite odd, but it's a historical hangover for compatibility with earlier versions of Linux which didn't reliably give proper status exit codes, but they did print OK or running reliably.

Heartbeat calls the status operation in many places. We do it before starting any resource, and also (IIRC) when releasing resources.

After repeated stop failures, we will do a status on the resource. If the status reports that the resource is still running, then we will reboot the machine to make sure things are really stopped.

I.e., a running resource should literately print OK or running with the status operation, nothing more. Both httpd and mysqld use the status function in /etc/rc.d/init.d/functions, which prints echo $"${base} (pid $pid) is running..." instead. Ding~. Changing it to echo $"running" finally eliminates the annoying message from /var/log/messages. Of course as a conservative, I copied /etc/rc.d/init.d/functions to /etc/ha.d/resource.d/functions, and modified correspondingly.

No comments: