Our goal is to setup a pair of RackSpace Cloud Servers in a redundant cluster using a shared IP address. We’ll use the “heartbeat” package from Linux-HA (http://www.linux-ha.org) for the cluster messaging layer and “pacemaker” package from ClusterLabs (http://clusterlabs.org) for the cluster resource manager.
Before starting this procedure you’ll need to:
a. Create the two cloud servers. These instructions are specific to CentOS for the operating system.
b. Open a ticket with RackSpace Cloud support and request a public IP address to be shared between the servers.
You can use the instructions for other situations but you’ll need to make the appropriate adjustments.
1. Setup hosts file entries. On each server, edit /etc/hosts and add entries for each servers public and private interfaces. You’ll also find it convenient to setup ssh keys between the servers for easy access.
2. Now use yum to install some prerequisite packages:
1 |
yum install net-snmp perl-libwww-perl libesmtp perl-Net-SSLeay perl-MailTools ipvsadm OpenIPMI libibverbs librdmacm openhpi PyXML |
Repeat this step on the second server.
Note: Several of these packages are not available on the standard RHEL yum channels. If you’re working on something other then a RackSpace Cloud server then you might need to install the EPEL channel. Just go to:
http://fedoraproject.org/wiki/EPEL
Then download and install the appropriate package to add EPEL.
3. The version of heartbeat available in the standard yum repositories is outdated. So we’ll install a more recent version of heartbeat, pacemaker and supporting components from:
http://www.clusterlabs.org/rpm
Start by creating a working folder:
1 2 |
mkdir /root/archive cd /root/archive |
Then use wget to download the latest version of each of the following package:
1 2 3 4 5 6 7 8 9 10 11 |
cluster-glue-1.0.6-1.6.el5.x86_64.rpm cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm corosync-1.2.7-1.1.el5.x86_64.rpm corosynclib-1.2.7-1.1.el5.x86_64.rpm heartbeat-3.0.3-2.el5.x86_64.rpm heartbeat-libs-3.0.3-2.el5.x86_64.rpm openais-1.1.3-1.6.el5.x86_64.rpm openaislib-1.1.3-1.6.el5.x86_64.rpm pacemaker-1.0.10-1.4.el5.x86_64.rpm pacemaker-libs-1.0.10-1.4.el5.x86_64.rpm resource-agents-1.0.3-2.el5.x86_64.rpm |
Finally install the packages:
1 |
rpm -i *.rpm |
Repeat this step on the second server.
5. Next step is to configure heartbeat.
a. Setup keys for authentication between the instances.
Edit /etc/ha.d/authkeys and add:
1 2 |
auth 1 1 sha1 [PASSWORD] |
Replace [PASSWORD] with a long random string.
b. Set permissions on the authkeys file:
1 |
chmod 600 /etc/ha.d/authkeys |
c. Next edit /etc/ha.d/ha.cf and add the following:
1 2 3 4 5 6 7 8 9 10 |
autojoin none keepalive 2 deadtime 15 warntime 5 initdead 120 ucast eth1 [INTERNAL IP OF HOST2] node [HOST1] node [HOST2] use_logd yes crm respawn |
Set [HOST1] and [HOST2] to the hostnames of the servers.
Set [INTERNAL IP OF HOST2] to the private IP address of the second server.
Repeat these steps on the second server. When you create the ha.cf file for the second server, you’ll use the internal IP of the first server in the ucast line.
d. Setup logd for automatic startup:
1 |
/sbin/chkconfig --level 345 logd on |
Now repeat this procedure on the second server but make sure you set the internal IP of the first server in the ha.cf file.
6. Finally start the heartbeat and logd service on both servers:
1 2 |
/sbin/service logd start /sbin/service heartbeat start |
7. The next step is to configure pacemaker.
Run the pacemaker configuration tool. It is called “crm”. You’ll use it to configure “resources” which in this case is a shared IP.
1 |
crm configure |
If you get an error like “cibadmin not available, check your installation” when trying to run crm, then make sure that the “which” package is installed and that /usr/sbin is in your path.
Now enter the following into the pacemaker shell:
1 2 3 4 5 6 |
primitive shared_ip_one IPaddr params ip=[SHARED_IP] cidr_netmask="255.255.255.0" nic="eth0" property stonith-enabled="false" location share_ip_one_master shared_ip_one 100: [HOST1] monitor shared_ip_one 20s:10s commit exit |
Where [SHARED_IP] is the IP address to be shared between the servers and [HOST1] is the hostname of the primary server.
Once this is done then you should be able to monitor the status of the cluster from either node using the crm_mon command. You’ll get output like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
============ Last updated: Sun Feb 6 14:00:42 2011 Stack: Heartbeat Current DC: node01 (cad6f81e-f772-4add-b5e2-c9a78b4ae430) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node02 node01 ] shared_ip_one (ocf::heartbeat:IPaddr): Started node01 |
8. Next step is to test failover on the servers.
a. Run crm_mon on the second server.
b. Reboot the first server:
1 |
/sbin/reboot |
c. Monitor the second server and notice that when the first goes offline, the “shared_one_ip” is switched to the second server. After the first server finishes rebooting then you should see it come back online and “shared_one_ip” return to it’s original location on the first server.
d. Repeat this test but reboot the second server and monitor the first.
And that completes the setup process. You now have an HA Linux cluster on the cloud!