HA Linux Cluster On RackSpace Cloud Servers

By admin on March 27, 2011 in Uncategorized

Our goal is to setup a pair of RackSpace Cloud Servers in a redundant cluster using a shared IP address. We’ll use the “heartbeat” package from Linux-HA (http://www.linux-ha.org) for the cluster messaging layer and “pacemaker” package from ClusterLabs (http://clusterlabs.org) for the cluster resource manager.

Before starting this procedure you’ll need to:

a. Create the two cloud servers. These instructions are specific to CentOS for the operating system.

b. Open a ticket with RackSpace Cloud support and request a public IP address to be shared between the servers.

You can use the instructions for other situations but you’ll need to make the appropriate adjustments.

1. Setup hosts file entries. On each server, edit /etc/hosts and add entries for each servers public and private interfaces. You’ll also find it convenient to setup ssh keys between the servers for easy access.

2. Now use yum to install some prerequisite packages:

yum install net-snmp perl-libwww-perl libesmtp perl-Net-SSLeay perl-MailTools ipvsadm OpenIPMI libibverbs librdmacm openhpi PyXML

1	yum install net-snmp perl-libwww-perl libesmtp perl-Net-SSLeay perl-MailTools ipvsadm OpenIPMI libibverbs librdmacm openhpi PyXML

Repeat this step on the second server.

Note: Several of these packages are not available on the standard RHEL yum channels. If you’re working on something other then a RackSpace Cloud server then you might need to install the EPEL channel. Just go to:

http://fedoraproject.org/wiki/EPEL

Then download and install the appropriate package to add EPEL.

3. The version of heartbeat available in the standard yum repositories is outdated. So we’ll install a more recent version of heartbeat, pacemaker and supporting components from:

http://www.clusterlabs.org/rpm

Start by creating a working folder:

mkdir /root/archive
cd /root/archive

1 2	mkdir /root/archive cd /root/archive

Then use wget to download the latest version of each of the following package:

cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.2.7-1.1.el5.x86_64.rpm
corosynclib-1.2.7-1.1.el5.x86_64.rpm
heartbeat-3.0.3-2.el5.x86_64.rpm
heartbeat-libs-3.0.3-2.el5.x86_64.rpm
openais-1.1.3-1.6.el5.x86_64.rpm
openaislib-1.1.3-1.6.el5.x86_64.rpm
pacemaker-1.0.10-1.4.el5.x86_64.rpm
pacemaker-libs-1.0.10-1.4.el5.x86_64.rpm
resource-agents-1.0.3-2.el5.x86_64.rpm

cluster-glue-1.0.6-1.6.el5.x86_64.rpm

cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm

corosync-1.2.7-1.1.el5.x86_64.rpm

corosynclib-1.2.7-1.1.el5.x86_64.rpm

heartbeat-3.0.3-2.el5.x86_64.rpm

heartbeat-libs-3.0.3-2.el5.x86_64.rpm

openais-1.1.3-1.6.el5.x86_64.rpm

openaislib-1.1.3-1.6.el5.x86_64.rpm

pacemaker-1.0.10-1.4.el5.x86_64.rpm

pacemaker-libs-1.0.10-1.4.el5.x86_64.rpm

resource-agents-1.0.3-2.el5.x86_64.rpm

Finally install the packages:

rpm -i *.rpm

1	rpm -i *.rpm

Repeat this step on the second server.

5. Next step is to configure heartbeat.

a. Setup keys for authentication between the instances.

Edit /etc/ha.d/authkeys and add:

auth 1
1 sha1 [PASSWORD]

1 2	auth 1 1 sha1 [PASSWORD]

Replace [PASSWORD] with a long random string.

b. Set permissions on the authkeys file:

chmod 600  /etc/ha.d/authkeys

1	chmod 600 /etc/ha.d/authkeys

c. Next edit /etc/ha.d/ha.cf and add the following:

autojoin none
keepalive 2
deadtime 15
warntime 5
initdead 120
ucast eth1 [INTERNAL IP OF HOST2]
node [HOST1]
node [HOST2]
use_logd yes
crm respawn

autojoin none

keepalive 2

deadtime 15

warntime 5

initdead 120

ucast eth1 [INTERNAL IP OF HOST2]

node [HOST1]

node [HOST2]

use_logd yes

crm respawn

Set [HOST1] and [HOST2] to the hostnames of the servers.

Set [INTERNAL IP OF HOST2] to the private IP address of the second server.

Repeat these steps on the second server. When you create the ha.cf file for the second server, you’ll use the internal IP of the first server in the ucast line.

d. Setup logd for automatic startup:

/sbin/chkconfig --level 345 logd on

1	/sbin/chkconfig --level 345 logd on

Now repeat this procedure on the second server but make sure you set the internal IP of the first server in the ha.cf file.

6. Finally start the heartbeat and logd service on both servers:

/sbin/service logd start
/sbin/service heartbeat start

1 2	/sbin/service logd start /sbin/service heartbeat start

7. The next step is to configure pacemaker.

Run the pacemaker configuration tool. It is called “crm”. You’ll use it to configure “resources” which in this case is a shared IP.

crm configure

1	crm configure

If you get an error like “cibadmin not available, check your installation” when trying to run crm, then make sure that the “which” package is installed and that /usr/sbin is in your path.

Now enter the following into the pacemaker shell:

primitive shared_ip_one IPaddr params ip=[SHARED_IP] cidr_netmask="255.255.255.0" nic="eth0"
property stonith-enabled="false"
location share_ip_one_master shared_ip_one 100: [HOST1]
monitor shared_ip_one 20s:10s
commit
exit

primitive shared_ip_one IPaddr params ip=[SHARED_IP] cidr_netmask="255.255.255.0" nic="eth0"

property stonith-enabled="false"

location share_ip_one_master shared_ip_one 100: [HOST1]

monitor shared_ip_one 20s:10s

commit

exit

Where [SHARED_IP] is the IP address to be shared between the servers and [HOST1] is the hostname of the primary server.

Once this is done then you should be able to monitor the status of the cluster from either node using the crm_mon command. You’ll get output like this:

============
Last updated: Sun Feb  6 14:00:42 2011
Stack: Heartbeat
Current DC: node01 (cad6f81e-f772-4add-b5e2-c9a78b4ae430) - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node02 node01 ]

shared_ip_one   (ocf::heartbeat:IPaddr):        Started node01

============

Last updated: Sun Feb 6 14:00:42 2011

Stack: Heartbeat

Current DC: node01 (cad6f81e-f772-4add-b5e2-c9a78b4ae430) - partition with quorum

Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3

2 Nodes configured, 2 expected votes

1 Resources configured.

============

Online: [ node02 node01 ]

shared_ip_one (ocf::heartbeat:IPaddr): Started node01

8. Next step is to test failover on the servers.

a. Run crm_mon on the second server.

b. Reboot the first server:

/sbin/reboot

1	/sbin/reboot

c. Monitor the second server and notice that when the first goes offline, the “shared_one_ip” is switched to the second server. After the first server finishes rebooting then you should see it come back online and “shared_one_ip” return to it’s original location on the first server.

d. Repeat this test but reboot the second server and monitor the first.

And that completes the setup process. You now have an HA Linux cluster on the cloud!

Top Nav

Navigation

HA Linux Cluster On RackSpace Cloud Servers