Sunday, October 21, 2007

STONITH with DRBD and Heartbeat

Node fencing in Heartbeat is implemented by STONITH. Cutting the documentation chase, suppose an external STONITH plugin already exists, how to configure it with Heartbeat? Here comes our special feature for the week: what can /var/log/messages + Google do for you.

Before heading down the road, make sure the standalone stonith works:

stonith -t external/myplugin -T reset -p "NODE IP_ADDR USER STONITH_PASSWD_FILE" nodename

Now configure either stonith or stonith_host directive in /etc/ha.d/ha.cf. Note that the two directives are mutually exclusive. Also note the set of directives automatically implied when crm is turned on.

For version 1 Heartbeat (i.e. crm no), this should be sufficient. However, during our test, after simulating a failure with kill -9 heartbeat_master_process_id, and successfully STONITH-ed the node, resources did not migrate. /var/log/messages revealed the following:

ResourceManager: info: Running /etc/ha.d/resource.d/drbddisk mysql start
ResourceManager: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
ResourceManager: CRIT: Giving up resources due to failure of drbddisk::mysql
ResourceManager: info: Releasing resource group:

which means that one of our drbd resources failed to start on the new node, and heartbeat hence stopped all resources defined in the same group as the drbd resource. This turns out to be related to the heartbeat deadtime and drbd ping time. Without much clue on how to relatively tune up the two parameters, I simply followed the hack mentioned in a rather informative discussion, modifying /etc/ha.d/resource.d/drbddisk. Simply increase the value of variable try to 20 in:

case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
#try=6
fixed the problem.

More tweaks are required for version 2 Heartbeat (i.e. crm yes) though. The simplest way to configure STONITH with crm is still using /usr/lib64/heartbeat/haresources2cib.py to automatically generate /var/lib/heartbeat/crm/cib.xml from /etc/ha.d/ha.cf. However, /usr/lib64/heartbeat/haresources2cib.py has a typo on the line:

if option_details[0] == "stonith_enabled" and enable_stonith:

where "stonith_enabled" should have been ""stonith-enabled"" instead. This bug would result STONITH disabled in the generated /var/lib/heartbeat/crm/cib.xml:

<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>

Fix either Python or XML to enable STONITH with crm.

Progressing further, we again experienced the similar problem that resources didn't migrate after successful STONITH. This time /var/log/messages was more descriptive:

lrmd: info: RA output: (drbddisk_2:start:stderr) ioctl(,SET_STATE,) failed:
lrmd: info: RA output: (drbddisk_2:start:stderr) Permission denied Partner is already primary
lrmd: info: RA output: (drbddisk_2:start:stderr) Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 20
lrmd: info: RA output: (drbddisk_2:start:stderr) drbdadm aborting

No sweat, it's the same old deadtime vs. ping time issue. Such messages repeated 5 times in the log, indicating /etc/ha.d/resource.d/drbddisk attempts. But wait, didn't we already increased the "try" to 20? Why only 5? Well, noticed this in the log?

lrmd: WARN: on_op_timeout_expired: TIMEOUT: operation start on heartbeat::drbddisk::drbddisk_2 for client 22836, its parameters: CRM_meta_op_target_rc=[7] 1=[mysql] CRM_meta_timeout=[5000] crm_feature_set=[1.0.7] .
crmd: ERROR: process_lrm_event: LRM operation drbddisk_2_start_0 (18) Timed Out (timeout=5000ms)

Apparently that 5000ms timeout to start this drbd resource is a bit too short, which can be changed in corresponding primitive sections in /var/lib/heartbeat/crm/cib.xml by adding something like:
<op id="drbddisk_2_start" name="start" timeout="60s"/>

Be sure to extend the start timeout for both the heartbeat resource drbddisk and its corresponding ocf resource for the file system. More details about cib.xml can be found in /usr/lib64/heartbeat/crm.dtd. Some contraint examples are particularly helpful in understanding and configuring preferred stonithd resource locations using INFINITY/-INFINITY scores.

Sunday, October 14, 2007

Tunnel Light

It had been a reasonably quiet and productive week.

  • After enough messing around with credentials, certificates, and things like that, I witnessed some GridFTP portal displaying my home directories on both Data Capacitor and HPSS, how rewarding.

  • Another WYIN/ODI project meeting with astronomers, indicating yet another potential CIMA collaboration. The interested starting point seems to be resuming old RoboScope work for SpectraBot.

  • Application level failover testings for CIMA. The system now seems to be capable of surviving most failures/outages except for one, which roots deep down at the instrument data collection side.

  • Boiled down the annoying "There is something wrong" problem with the heatbeat configuration, bless our poor souls that suffered this message for months.

  • Reviewed a paper for GCE07, and also got back "warm words" for the one with my name on it.


More failover testings await for the week ahead, mostly at the system level. If we could tune up across distances by Friday, I'd see that light at the end of the tunnel better, I mean the tunnel of SC07 of course.

Friday, October 12, 2007

Making of Heartbeat2 Resources: Is There Something Wrong?

Linux HA project is a rather powerful and useful high availability solution, yet can be painful too, especially for those lost in documentations. We have many stories to tell, but this one amuses me most, by far.

In short, Heartbeat2 is capable of monitoring individual resources in addition to cluster nodes through Cluster Resource Manager (CRM). When a failed resource is detected, it will try to restart the resource on the same node. Our codes haven't been evil enough to testify the following statement from FAQ though: "The node will try to restart the resource, but if this fails, it will fail over to an other node. A feature that allows failover after N failures in a given period of time is planned."

To turn on the CRM resource monitor feature:
  1. Modify /etc/ha.d/ha.cf to include the line crm yes

  2. Configure resources to be managed in /etc/ha.d/haresources. Though no longer used in Heartbeat2, we found it easier to just go through the conversion path.

  3. Clean any old CRM configurations:
    rm -f /var/lib/heartbeat/crm/cib.xml*

  4. Generate the fresh configuration file /var/lib/heartbeat/crm/cib.xml:
    python /usr/lib64/heartbeat/haresources2cib.py

  5. Start Heartbeat daemon:
    /etc/init.d/heartbeat start

If some resource can not be started correctly, likely that its corresponding Init script is not LSB Compliant. Just test and fix accordingly. For example, default httpd and mysqld that come with RHEL4 distributions are not LSB compliant. A conservative approach is to make a copy from /etc/init.d/ to /etc/ha.d/resource.d/, and modify from there.

Once all heartbeat managed resources have been started and running correctly, failover scenarios can be simulated and tested through forced nodes takeover, using crm_standby.

To modify anything in /var/lib/heartbeat/crm/cib.xml, using cibadmin is required. For example:

cibadmin -U -o resources -X '<op id="test_8_mon" interval="1s" name="monitor" timeout="2s"/>'

changes the interval and timeout values for monitoring our test resource.

Still not too bad? Well, it's not finished yet if you start to notice this message in /var/log/messages: (BTW, "ouput" is NOT my typo.:)

WARN: There is something wrong: the first line isn't read in. Maybe the heartbeat does not ouput string correctly for status operation. Or the code (myself) is wrong.

It may seem not so harmful when predefined resources and services are all up and running, and can migrate between nodes without any problem. However, we experienced mysterious hearbeat behaviors from time to time when such messages were flooding /var/log/messages. For example, when our failed test resource was detected, but could not be restarted, heartbeat started to just completely ignore the resource. No further actions were taken. This may not be related, but it hasn't occurred since the above error was eliminated.

So where is "something wrong"? The Resource Agent. Examining /var/lib/heartbeat/crm/cib.xml closely, we noticed that resources like httpd, mysqld, and our test resource are all defined as Heartbeat Resource Agents, which "are basically LSB init scripts - with slightly odd status operations". Odd where?
The status operation has to really report status correctly, AND, it has to print either OK or running when the resource is active, and it CANNOT print either of those when it's inactive. For the status operation, we ignore the return code.

This sounds quite odd, but it's a historical hangover for compatibility with earlier versions of Linux which didn't reliably give proper status exit codes, but they did print OK or running reliably.

Heartbeat calls the status operation in many places. We do it before starting any resource, and also (IIRC) when releasing resources.

After repeated stop failures, we will do a status on the resource. If the status reports that the resource is still running, then we will reboot the machine to make sure things are really stopped.

I.e., a running resource should literately print OK or running with the status operation, nothing more. Both httpd and mysqld use the status function in /etc/rc.d/init.d/functions, which prints echo $"${base} (pid $pid) is running..." instead. Ding~. Changing it to echo $"running" finally eliminates the annoying message from /var/log/messages. Of course as a conservative, I copied /etc/rc.d/init.d/functions to /etc/ha.d/resource.d/functions, and modified correspondingly.

Monday, October 8, 2007

Across Grids: got CA certificates?

I had problem GridFTP between gf1 and BigRed earlier. On gf1.ucs.indiana.edu:

$globus-url-copy -vb gsiftp://gridftp.bigred.iu.teragrid.org/N/dc/scratch/myusername/8GB gsiftp://gf1.ucs.indiana.edu/home/myusername/8GB

error: globus_ftp_control: gss_init_sec_context failed
OpenSSL Error: s3_clnt.c:842: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed
globus_gsi_callback_module: Could not verify credential
globus_gsi_callback_module: Could not verify credential: self signed certificate in certificate chain


It turns out that I was missing gf1 server certificates in ~/.globus/certificates/. Copied from /etc/grid-security/certificates/, all happy then.

Around BigRed: Available File Systems

Reference IU Research Systems Disk Space Guide for more detailed descriptions. Particularly on BigRed, general users can access several file systems as follows:

  • Home directory: /N/u/username/BigRed

  • Local scratch: /scratch/[username]

  • Shared scratch (GPFS): /N/gpfs/[username]

  • Data Capacitor scratch (Lustre): /N/dc/scratch/[username]


Note that Data Capacitor project space is accessible for project group members via /N/dc/projects/[projectname] .

Friday, October 5, 2007

Down the Rabbit Hole

Sometime in late September, I transferred to this new group focusing on portal and science gateway development, where everyone blogs everything. While down the rabbit hole, one might as well keep up with the trend. And in case you wonder, yes it's from Tales of Symphonia.

Meetings, meetings, Meetings.

It had been a couple of quite active weeks, some of us met even more than once:

  • Regular Data Capacitor group meetings: SC07 is the theme. A separate GridFTP server to DC is on the way too.

  • Portal group meetings: dissertation work review, especially for CIMA, in 10 slides or less to impress general audience for possible collaborations (SNS at ORNL, etc.); setting priorities for projects.

  • WYIN/ODI project meeting with astronomers: short presentations, getting acquainted, and background stories.

  • RT-ALL strategy meeting: long

  • Regular TeraGrid meetings: recoup

  • Astronomy department brown bag seminar: using Data Capacitor for ODI project

  • Tech Tuesday talk: preserving data objecs

  • CS/PTL/UITS monthly meeting: infoshare


Progress

  • Got TeraGrid account, login, tried single-sign-on, tested gridftp between DC and HPSS.

  • Updated CIMA web service for Admin portlet to indicate number of movies associated with a given sample.

  • Oh yeah, of course wrote this blog, using correct HTML under right editor, you think that's easy?


Miscellaneous

  • Couldn't out-wait Apple on Leopard, hence went ahead and got the MBP. A beauty, and, (un)fortunately hot.

  • ORNL is renewing my badge, oh the number of emails and resumes it takes.

  • Made SC travel arrangement, at least that one was not bad at all.


To Do

  • More testings with GridFTP, BigRed, DC, HPSS, TeraGrid credentials. What exactly is that SC portal supposed to be or do?

  • Extensive CIMA failover testings, both at application and system levels. Targeted service up date is Oct. 19th.

  • 10-15 minutes SC07 presentation on CIMA failover, better start thinking about slides.

  • Read up on NVO before next meeting with astronomers, Tuesday Oct. 9th.

GridFTP with BigRed/Data Capacitor and HPSS - Idiot's Guide

Finally I received a username and password to open the magic door to TeraGrid. In another moment down went Alice after it, never once considering how in the world she was to get out again.

Simplest steps to GridFTP zeros with BigRed and HPSS follow:

  1. Refer to the general TeraGrid single sign-on guide. An example session from gf1.ucs.indiana.edu:
    $ export GLOBUS_LOCATION=/home/globus/nmi-8.0-rh9/
    $ source /home/globus/nmi-8.0-rh9/etc/globus-user-env.sh
    $ export MYPROXY_SERVER=myproxy.teragrid.org
    $ export MYPROXY_SERVER_PORT=7514
    $ mkdir -p ~/.globus/certificates
    $ cd ~/.globus/certificates
    $ wget http://ca.ncsa.uiuc.edu/4a6cd8b1.0
    $ wget http://ca.ncsa.uiuc.edu/4a6cd8b1.signing_policy
    $ wget http://ca.ncsa.uiuc.edu/4a6cd8b1.r0
    $ myproxy-logon -T -l myusername
    Enter MyProxy pass phrase:
    A credential has been received for user myusername in /tmp/x509up_u1209.

  2. Login to BigRed using gsissh
    $ gsissh tg-login.iu.teragrid.org

  3. Start transfer zeros:
    $globus-url-copy -vb gsiftp://gridftp.bigred.iu.teragrid.org/N/dc/scratch/myusername/8GB gsiftp://gridftp.mdss.iu.edu/hpss/m/y/username/
    8388608000 bytes 75.40 MB/sec avg 36.36 MB/sec inst

  4. Not bad, right? ;) Just make sure the absolute file paths of both source and destination are correct.

  5. Oh, and my ~/.soft on BigRed looks like:
    @bigred
    @teragrid-basic
    @globus-4.0
    @teragrid-dev

Kerberos, Gridsphere, and a bit more

I tried to develop a portal to enable easy access between Data Capacitor and HPSS for general users authenticating through Kerberos at the beginning of the year. Here are some old notes from back then, when I struggled to install all different pieces together.

I. Prerequisite of Software, versions as of 03/12/2007
  1. Apache Ant 1.7.0
  2. Apache Tomcat 5.5.20
  3. Sun JDK 1.6
  4. GridSphere 2.2.8
  5. Apache2 web server
  6. Secure Perl web services require packages Soap:Lite and Crypt-SSLeay
  7. Axis 1.4 is needed for wsdl2java tool to convert the web services WSDL document into Java codes.

II. Installation Tweaks for authentication and security

a.) SSL configuration for Tomcat (Reference Tomcat Howto)
  1. Create a certificate keystore by executing the following and specify a password:
    $JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA

  2. Uncomment the "SSL HTTP/1.1 Connector" entry in $CATALINA_HOME/conf/server.xml and tweak as necessary, particularly defining the attribute "keystorePass" with the chosen password from the previous step.

    <!-- Define a SSL HTTP/1.1 Connector on port 8443 -->
    <Connector port="8443" maxhttpheadersize="8192" maxthreads="150"
    minsparethreads="25" maxsparethreads="75" enablelookups="false"
    disableuploadtimeout="true" acceptcount="100" scheme="https"
    secure="true" clientauth="false" sslprotocol="TLS"
    keystorepass="mykeystorepassword"/>



b.) Kerberos configuration for GridSphere
With the following configuration, existing GridSphere portal users can authenticate through a designated Kerberos server, assuming /etc/krb5.conf is valid.
  1. Modify the <auth-module> section for "GridSphere JAAS" in $CATALINA_HOME/webapps/gridsphere/WEB-INF/authmodules.xml, setting <active> to true. Note that the priority number in different <auth-module> sections indicates the fallback orders of multiple authentication schemes. Smaller numbers are associated with higher priorities.

  2. Create a file $CATALINA_HOME/conf/jaas.conf as following:

    Gridsphere {
    com.sun.security.auth.module.Krb5LoginModule required;
    };

  3. Modify $CATALINA HOME/bin/catalina.sh to include the following:

    export JAVA_OPTS="-Djava.security.auth.login.config=$CATALINA_HOME/conf/jaas.conf"

c.) HTTPS configuration for Apache2 web server on Fedora (Reference Apache Installation and Configuration Guide on Fedora Core)
  1. Create a new CA certificate

    [root@localhost root]# cd /usr/share/ssl/misc
    [root@localhost misc]# ./CA -newca
  2. Create a Certificate Signing Request (CSR)

    [root@localhost misc]# ./CA -newreq
  3. Sign the CSR

    [root@localhost misc]# ./CA -sign

  4. Store certificates in a directory

    [root@localhost var]# mkdir myCA
    [root@localhost var]# cd myCA
    [root@localhost myCA]# cp /usr/share/ssl/misc/demoCA/cacert.pem .
    [root@localhost myCA]# cp /usr/share/ssl/misc/newcert.pem ./servercert.pem
    [root@localhost myCA]# cp /usr/share/ssl/misc/newreq.pem ./serverkey.pem
    [root@localhost myCA]# ls
    cacert.pem servercert.pem serverkey.pem
    [root@localhost myCA]# cd /var/myCA
    [root@localhost myCA]# cp servercert.pem /etc/httpd/conf/ssl.crt/server.crt
    cp: overwrite `/etc/httpd/conf/ssl.crt/server.crt'? y
    [root@localhost myCA]# cp serverkey.pem /etc/httpd/conf/ssl.key/server.key
    cp: overwrite `/etc/httpd/conf/ssl.key/server.key'? y

  5. Edit ssl.conf (optional): open ssl.conf for editing, and uncomment and edit the following directives. You may want to change DocumentRoot to point to another directory, such as /var/www/ssl, and place your SSL files inside there instead.

    DocumentRoot
    ServerName
    ServerAdmin

  6. Require SSL (Data Capacitor specific): edit httpd.conf, comment the section that listens on port 80, and add SSLRequireSSL and Options ExecCGI to CGI directory configuration. e.g.

    ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
    <Directory cgi-bin='' www='' var=''>
    SSLRequireSSL
    Options ExecCGI
    AllowOverride None
    Options None
    Order allow,deny
    Allow from all
    </Directory>

  7. Disabling the passphrase on startup (Optional): to startup Apache automatically on boot without user intervention, the passphrase prompt can be disabled by simply de-crypting the server key.

    # cd /etc/httpd/conf/ssl.key
    # cp server.key server.bak
    # openssl rsa -in server.bak -out server.key

d.) Java SSL configuration with self-signed certificates (Reference here)
When opening an SSL connection to a host using self-signed certificates in Java, following exceptions may be thrown:
 
javax.net.ssl.SSLHandshakeException: sun.security.validator.
ValidatorException: PKIX path building failed: sun.security.provider.certpath.
SunCertPathBuilderException: unable to find valid certification path to requested target.

To add the server's certificate to the KeyStore of trusted certificates, a simple solution is to compile and run the InstallCert program:

java InstallCert hostname

It displays the complete certificate and adds it to a Java KeyStore 'jssecacerts' in the current directory. Either configure JSSE to use it as the trust store, or copy it into $JAVA_HOME/jre/lib/security directory. For all Java applications to recognize the certificate as trusted and not just JSSE, you could also overwrite the cacerts file in that directory.

e.) Secure web services configuration:
  1. Specify the https location in the <service> tag of WSDL

  2. To encode information in both soap header and body, reference WSDL specification, or Chapter 9 of "Programming Web Services with Perl".

III. GridSphere Tips
  1. To share a variable among different portlets within an application, use setAttribute and getAttribute of PortletSession at "APPLICATION_SCOPE".

  2. For simple persistence of user information between logins, use setups for PortletPreferences.

  3. To forward username and password upon login to external secure web services, modify the login function in src/org/gridlab/gridsphere/services/core/user/impl/LoginServiceImpl.java

  4. To change logout behavior, modify the logout function in src/org/gridlab/gridsphere/servlets/GridSphereServlet.java

  5. Given a WSDL document, use wsdl2java tool in Axis package to generate corresponding Java codes; compile them with:

    javac -d . -classpath $CP *.java

    create a jar file with:

    jar -cf mywebservices.jar MyWebServices_pkg/

    and finally put the jar file in the corresponding lib directory of GridSphere portlet application. Note that jar files in the lib directory of axis need to be in the classpath.