Building the ACM Cluster, Part 8: Adventures in Routing: Source Based (Multi-homed) Routing

Published on Wed, Oct 17, 2012 in ACM, ACM Cluster

(This post is related to the ACM cluster build. However, it is really generic systems stuff and not terribly related to the actual cluster build. It is much more closely related to quirks of JHU networking.)

The Problem

JHU has two distinct networks - firewalled and firewall-free. (In truth there are more and there are gradations, but these are the two JHUACM has IP allocations on.) Some services cannot be run form inside the firewalled network. For these the ACM has a small firewall-free allocation. Because the cluster will be hosting VMs inside both networks, it needs to be capable of routing traffic from both. This means doing something called source-based routing or multihomed routing. This refers to the fact that this machine will have two connections to the internet. Typically, this is a very rare setup - Multihoming is usually used at the ISP or datacenter level, rather than at the level of the individual box.

The Solution

The solution is to convert linux to …

Building the ACM Cluster, Part 7: Network redux

Published on Thu, Oct 11, 2012 in ACM, ACM Cluster

So I’ve mentioned that I’ve been fighting networking again in the ACM cluster. I’ve been reworking the network. This whole adventure began after a conversation with the very knowledgeable nwf, who pointed out that JHU runs two different networks that the ACM systems need access to - CSNet (the JHU Computer Science department’s network) and JHU firewall-free (which has unfiltered access to the internet). The goal of this rework was to allow the cluster to be on both. In a situation with more resources, I would have simply bought another network card for each of the gateway nodes. However, I don’t have those resources and couldn’t find any spare network cards. nwf then pointed out that I would be able to use 802.1Q vlans to make more virtual ports.

So, here’s how this works: CSNet and JHU firewall-free (JHUFF) each plug into a single port on the main switches. These ports are assigned as being on specific vlans that differ from the other …

Configuring Cisco 2900XL Switches via TFTP (and alongside xCat)

Published on Tue, Sep 11, 2012 in ACM, ACM Cluster

As part of my ACM cluster creation obsession, I wanted to make the Cisco 2900XL switches we have autoconfigure. I’m not terribly familiar with how these switches operate and, rather than reconfigure all of them at once, I’d much rather be able to reboot (reload, in cisco parlence) them and have them come up with my new, clean config.

Fortunately for me, Cisco thought of a situation like this when building the software for these switches. According to this page from their instruction manual, if a switch boots and doesn’t have an in-flash configuration file, it will DHCP/BOOTP for a network address, then attempt to TFTP one of several files - network-confg, cisconet.cfg, router-confg or ciscortr.cfg.

In my case, I’m running this next to xCat, so I already have a TFTP and DHCP servers running. Furthermore, my switch was already set up in xCat to get a static address. So all I really needed to do was put in the configuration file. But how to generate …

Building the ACM Cluster, Part 6: Network Configuration

Published on Sun, Sep 9, 2012 in ACM, ACM Cluster

In part 5 of this series, I covered the generic xCat setup and we told it how the network was set up. We’ve done little bits of network setup before (like setting our switches up for SNMP read access in the public community). Now we’re going to do quite a bit more.

Telling xCat where Nodes Are

xCat has this wonderful ability to know what a newly plugged in machine is, based on where it is on a switch. In my case, I have the first management node on eth-switch1’s 1st port, second management on the second port. I then have the fiber switch’s management interface on the 3rd port, and nodes are ports 4-20. The following set of commands encapsulates this, though you’ll note that I’ve intentionally not made any entries for the management nodes - no point in confusing xCat by doing so.

# nodech myricom-switch switch.switch=eth-switch1 switch.port=3
# nodech compute switch.switch=eth-switch1 switch.port='|\D+(\d+)$|($1+3)|'

Notice that …

Building the ACM VM Cluster, Part 5: xCat configuration

Published on Sun, Sep 9, 2012 in ACM, ACM Cluster

In part 4 of this series, we installed xCat. Now its time to configure it.

But First…

But first, lets talk a little about how you interact with xCat. xCat stores its configuration files in “tab"s - tables. In the default case, these are sqlite tables, though some will try to convince you to change it to MySQL. Unless you’re running thousands of nodes or continually hitting xCat, I see no point in this changeover and therefore will not cover it.

Interaction with xCat happens primarily through three commands: chnode, chtab and tabedit. However, nodels and tabdump are also useful for seeing what xCat got out of your input. For detail on these commands, read the man pages for each. However, here is a quick summary:

chnode: change configuration details on a node
chtab: Change configuration by key in a table
tabedit: Manually edit a table
nodels: show all details associated with a particular node
tabdump: show the data stored in a tab

If you’re looking …

Building the ACM VM Cluster, Part 4: xCat Install

Published on Sun, Sep 9, 2012 in ACM, ACM Cluster

In part 3 of this series, we got the management node set up to route traffic and generally properly configured. Now its time to actually install xCat!

Choices

Many xCat guides recommend downloading the RPMs and manually installing those. The benefit of this route is that you need no network connection to your management node for this setup. However, I prefer to have a simple upgrade path. I also know my cluster will continuously be connected to the internet, so I don’t need an offline configuration. Therefore I’m going to use the xCat repos to do the install. For this case, refer to this piece of xCat documentation if you need more detail.

Set up xCat repos

First we need wget so we can grab files from the xCat website:

# yum install wget

Now lets make sure we put the files in the right place:

# cd /etc/yum.repos.d

List the directory and make sure you don’t already have the xCat repos. If you do, then skip the next step. If you don’t already have the xCat …

Building the ACM VM Cluster, Part 3: Mangement Node Setup - Keepalived

Published on Sat, Sep 8, 2012 in ACM, ACM Cluster

In part 2 of this series, I covered the network design - the last theoretical piece of design we need. Now let’s do some practical stuff! In this section I’m going to cover base network setup and keepalived installation on the management node.

Prerequisites

For this I’ve assumed you already have a set up CentOS machine. In my case, its CentOS 6.3 (64-bit) though this may of course vary, in which case, refer to the XCat documentation to see any differences.

Procedure

Before we dive into this, lets establish some conventions. Commands, code, or something you’ll see in a system file looks

Like this.

Commands will be prefixed with either $ (for commands you can run as a normal user) or # (for commands to be run as root or with sudo). I use sane-editor to represent the text-editor of your choice (no flame wars here!)

Onwards! Disable SELinux

First, lets disable SELinux (it might step on our toes. Please feel free to reenable it later if you’d like it)

# …

Building the ACM VM Cluster, Part 2: Network design

Published on Sat, Sep 8, 2012 in ACM, ACM Cluster

Welcome to part 2 of my series of posts on building the ACM VM cluster! Part 1 covered the hardware and software that will be used in the cluster. Part 2 is going to focus on network design.

Introduction

Typically in clusters there are two networks (usually represented by IP ranges, though sometimes actually physically separate): management and work. Management is typically mostly for master node to worker node communication (giving and getting work orders), whereas work is for intercommunication between nodes. However, in our case, we are going to need three separate networks: management, work and fiber. Why? Well, we have fiber cards in the nodes, which go faster then the ethernet cards that are in the nodes, and I’d like to take advantage of that.

Physical Networks

As is probably obvious, we’ll be using two distinctly different physical networks - the fiber network and the ethernet network. Both the management and the work network will run over ethernet, while the …

Building the ACM VM Cluster, Part 1: Hardware

Published on Sat, Sep 8, 2012 in ACM, ACM Cluster

So I have a vision for the ACM systems - I’d like to make it so we are as redundant as possible and we can do updates to systems without having to take them offline. The obvious answer to this is using VMs, in a cluster. Fortunately, the ACM recently received a donation of the old DLMS cluster from physics (as well as three more recent Dell servers from JHU Housing and Dining). This is the perfect opportunity for me! So off we go, down the rabbit hole (again).

This post is going to focus on cluster design - first, a description of the hardware (since we didn’t get to choose it, I wont be covering why this hardware), then a description of the software architecture and the whys of each software choice.

Hardware

Head Node (Dell PowerEdge 2950)

4x Intel Xeon 5160 @ 3Ghz

4GB RAM

3x300GB SAS drives in hardware RAID5, approximately 600GB usable.

Cluster Nodes

2x AMD Opteron 244 @1.8GHz (Note: these have no hardware virtualization built in!)

RAM Varies, at least 2GB

40 GB IDE …

Luna: A Case-Study in Failure

Published on Sat, Sep 8, 2012 in ACM

I intended to make the first post here be about the technology under the site and how to set up a similar site. I will still write that one, but in the meantime a much more interesting problem has come up and I wish to share it with you.

This post is about the various issues, both human and machine that lead to the failure of Luna, the JHU ACM’s Xen virtual machine server. I’m going to start with an examination of what lead up to the failure, continue with details on the failure mode of this machine, detail my investigation of the failure, and finish with the recovery of the machine and data.

Before I get any further into this post, I wish to make absolutely clear that I am not writing this to assign blame. This post is meant to be informational, to examine issues that can lead to a catastrophic failure and even to suggest solutions. However, I believe that all involved were operating as best they knew how and it would therefore be unfair to blame anyone.

Luna, as built …