Building the ACM Cluster, Part 6: Network Configuration

Published on Sun, Sep 9, 2012 in ACM, ACM Cluster

In part 5 of this series, I covered the generic xCat setup and we told it how the network was set up. We’ve done little bits of network setup before (like setting our switches up for SNMP read access in the public community). Now we’re going to do quite a bit more.

Telling xCat where Nodes Are

xCat has this wonderful ability to know what a newly plugged in machine is, based on where it is on a switch. In my case, I have the first management node on eth-switch1’s 1st port, second management on the second port. I then have the fiber switch’s management interface on the 3rd port, and nodes are ports 4-20. The following set of commands encapsulates this, though you’ll note that I’ve intentionally not made any entries for the management nodes - no point in confusing xCat by doing so.

# nodech myricom-switch switch.switch=eth-switch1 switch.port=3
# nodech compute switch.switch=eth-switch1 switch.port='|\D+(\d+)$|($1+3)|'

Notice that I’ve again used a regexp to assign ports.

Peek at How xCat Sees Networks

One of the many “cool” features of xCat is network auto-detection on start. So lets see how xCat sees the network around it:

$ tabdump networks

Heres what my unconfigured output looks like:

#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,ddnsdomain,vlanid,domain,comments,disable
"10_1_10_0-255_255_255_0","10.1.10.0","255.255.255.0","eth0","10.1.10.1",,"10.1.10.114",,,,,,,,,,
"172_16_0_0-255_240_0_0","172.16.0.0","255.240.0.0","eth1","<xcatmaster>",,"172.16.0.2",,,,,,,,,,

(Note: 10.1.10.0/24 is my upstream router while building this cluster).

Those network names are rather ugly, aren’t they? Lets fix those:

# chtab netname=10_1_10_0-255_255_255_0 networks.netname="external"
# chtab netname=<span>172_16_0_0-255_240_0_0</span> networks.netname="internal"

Now, lets configure the internal network:

# chtab netname=internal networks.gateway=172.16.0.1 networks.dhcpserver=172.16.0.1 networks.tftpserver=172.16.0.1 networks.ntpservers="172.16.0.1" networks.logservers="172.16.0.1"

Set up DHCP Ranges

Throughout all of this I’ve been referring to xCat’s ability to autodetect nodes. One of the things it need to do this is a range of IP addresses it can give out to nodes that it doesn’t know yet. In my case, I’ve designated 172.22.0.0/15 as the DHCP range. So lets tell xCat this:

# chtab netname=internal networks.dynamicrange=172.22.0.1-172.23.255.254

Network Cleanup

xCat uses a magic discovery kernel to find new nodes on the network. Whenever we make changes to the network configuration, we have to regenerate it:

# mknb x86_64

Now, lets rebuild our network configuration:

# makedns

And lets restart DNS, then make sure that DNS will start properly on reboot:

# service named restart
# chkconfig --level 345 named on

Tell Nodes where to find Important Details

We need to tell xCat what the installation setup of the nodes is. To do this, run:

# nodegrpch compute noderes.primarynic=eth0 noderes.tftpserver=[management node name] noderes.installnic=eth0 noderes.nfsserver=[management node name] noderes.netboot=pxe

In this command, we’ve told the compute nodes which network interface they should use by default and where to find our management node for both tftp and nfs, which will be needed in the install process.

DHCP

Now lets make a DHCP configuration:

# makedhcp --new

And start DHCP, then make sure it autostarts on boot:

# service dhcpd start
# chkconfig --level 345 dhcpd on

Add Non-autodiscoverable MAC Addresses

Some nodes (in my case, all of the switches) are not auto-discoverable. If this is the case, you’ll need to manually tell xCat what the mac address is. That can be done with a command like this:

# nodech [node] mac.mac=[macaddr, with colons]

Conclusion

In the part we did rather a lot, didn’t we? We configured the network and started two new services - named and dhcpd - which will be critical to autodiscovering nodes. Next up, we’ll be building an operating system image for the nodes.