Sunday, October 21, 2007

STONITH with DRBD and Heartbeat

Node fencing in Heartbeat is implemented by STONITH. Cutting the documentation chase, suppose an external STONITH plugin already exists, how to configure it with Heartbeat? Here comes our special feature for the week: what can /var/log/messages + Google do for you.

Before heading down the road, make sure the standalone stonith works:

stonith -t external/myplugin -T reset -p "NODE IP_ADDR USER STONITH_PASSWD_FILE" nodename

Now configure either stonith or stonith_host directive in /etc/ha.d/ha.cf. Note that the two directives are mutually exclusive. Also note the set of directives automatically implied when crm is turned on.

For version 1 Heartbeat (i.e. crm no), this should be sufficient. However, during our test, after simulating a failure with kill -9 heartbeat_master_process_id, and successfully STONITH-ed the node, resources did not migrate. /var/log/messages revealed the following:

ResourceManager: info: Running /etc/ha.d/resource.d/drbddisk mysql start
ResourceManager: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
ResourceManager: CRIT: Giving up resources due to failure of drbddisk::mysql
ResourceManager: info: Releasing resource group:

which means that one of our drbd resources failed to start on the new node, and heartbeat hence stopped all resources defined in the same group as the drbd resource. This turns out to be related to the heartbeat deadtime and drbd ping time. Without much clue on how to relatively tune up the two parameters, I simply followed the hack mentioned in a rather informative discussion, modifying /etc/ha.d/resource.d/drbddisk. Simply increase the value of variable try to 20 in:

case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
#try=6
fixed the problem.

More tweaks are required for version 2 Heartbeat (i.e. crm yes) though. The simplest way to configure STONITH with crm is still using /usr/lib64/heartbeat/haresources2cib.py to automatically generate /var/lib/heartbeat/crm/cib.xml from /etc/ha.d/ha.cf. However, /usr/lib64/heartbeat/haresources2cib.py has a typo on the line:

if option_details[0] == "stonith_enabled" and enable_stonith:

where "stonith_enabled" should have been ""stonith-enabled"" instead. This bug would result STONITH disabled in the generated /var/lib/heartbeat/crm/cib.xml:

<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>

Fix either Python or XML to enable STONITH with crm.

Progressing further, we again experienced the similar problem that resources didn't migrate after successful STONITH. This time /var/log/messages was more descriptive:

lrmd: info: RA output: (drbddisk_2:start:stderr) ioctl(,SET_STATE,) failed:
lrmd: info: RA output: (drbddisk_2:start:stderr) Permission denied Partner is already primary
lrmd: info: RA output: (drbddisk_2:start:stderr) Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 20
lrmd: info: RA output: (drbddisk_2:start:stderr) drbdadm aborting

No sweat, it's the same old deadtime vs. ping time issue. Such messages repeated 5 times in the log, indicating /etc/ha.d/resource.d/drbddisk attempts. But wait, didn't we already increased the "try" to 20? Why only 5? Well, noticed this in the log?

lrmd: WARN: on_op_timeout_expired: TIMEOUT: operation start on heartbeat::drbddisk::drbddisk_2 for client 22836, its parameters: CRM_meta_op_target_rc=[7] 1=[mysql] CRM_meta_timeout=[5000] crm_feature_set=[1.0.7] .
crmd: ERROR: process_lrm_event: LRM operation drbddisk_2_start_0 (18) Timed Out (timeout=5000ms)

Apparently that 5000ms timeout to start this drbd resource is a bit too short, which can be changed in corresponding primitive sections in /var/lib/heartbeat/crm/cib.xml by adding something like:
<op id="drbddisk_2_start" name="start" timeout="60s"/>

Be sure to extend the start timeout for both the heartbeat resource drbddisk and its corresponding ocf resource for the file system. More details about cib.xml can be found in /usr/lib64/heartbeat/crm.dtd. Some contraint examples are particularly helpful in understanding and configuring preferred stonithd resource locations using INFINITY/-INFINITY scores.

1 comment:

Unknown said...

Very helpful for us, thanks!