Tuesday, June 24, 2008

Networking the Data Center

As if we had nothing else to do this year, we're busy planning for our new data center in Bloomington that will come on-line in the spring of 2009. I spent the better part of the afternoon yesterday working through a rough cost analysis of the various options to distribute network connectivity into the server racks, so I thought I'd share some of that with all of you. I'll start with a little history lesson :)

The machine rooms originally had a "home-run" model of networking. All the switches were located in one area and individual ethernet cable were "home-run" from servers directly to the switches. If you're ever in the Wrubel machine room, just pick up a floor tile in the older section to see why this model doesn't scale well ;-)

When the IT building @ IUPUI was built, we moved to a "zone" model. There's a rack in each area or "zone" of the room dedicated to network equipment. From each zone rack, cables are run into each server rack with patch panels on each end. All the zone switches had GbE uplinks to a distribution switch. We originally planned for a 24-port patch panel in every other server rack - which seemed like enough way back when - but we've definitely outgrown this ! So, when we started upgrading the Wrubel machine room to the zone model, we planned for 24-ports in every server racks. 24-ports of GbE is still sufficient for many racks, but the higher-density racks are starting to have 48-ports and sometimes 60 or more ports. This is starting to cause some issues !!

But first, why so many ports per rack ? Well, it's not outrageous to consider 30 1RU servers in a 44 RU rack. Most servers come with dual GbE ports built-in and admins want to use one port for their public interface and the second for their private network for backups and such. That's 60 GbE ports in a rack. - OR - In a large VMware environment, each physical server may have 6 or 8 GbE NICs in it: 2 for VMkernel, 2 for console, 2 for public network and maybe 2 more for private network (again backups, front-end to back-end server communications, etc). 8 NICs per physical server, 6 or 8 physical servers per rack and you have 48 to 64 GbE ports per rack.

So, why doesn't the zone model work ? In a nutshell, it's cable management and too much rack space consumed by patch panels. If you figure 12 server racks per "zone" and 48-ports per rack, you end up with 576 Cat6e cables coming into the zone rack. If you use patch panels, even with 48-port 1.5RU patch panels, you consume 18 RU just with patch panels. An HP5412 switch, which is a pretty dense switch, can support 264 GbE ports in 7 RU (assuming you use 1 of the 12 slots for 10G uplinks). So you'll need 2 HP5412s (14 total RU) PLUS an HP5406 (4 RU) to support all those ports. 18 + 14 + 4 = 36 - that's a pretty full rack - and you still need space to run 576 cables between the patch panels and the switches. If you don't use patch panels, you have 576 individual cables coming into the rack to manage. Neither option is very attractive !

Also, if you manage a large VMware environment, with 6 or 8 ethernet connections into each physical server, 10GbE starts looking like an attractive option (at least until you get the bill ;-). Can you collapse the 8 GbE connections into 2 10GbE connections ? The first thing that pops out when you look at this is that the cost to run 10GbE connections across the data center on fiber between servers and switches is simply prohibitive ! 10GBASE-SR optics are usually a couple grand (even at edu discounts), so the cost of a single 10GbE connection over multimode fiber is upwards of $4,000 *just* for the optics - not include the cost of the switch port or the NIC !

For both these reasons (high-density 1G and 10G) a top-of-rack (TOR) switch model starts looking quite attractive. The result is a 3-layer switching model with TOR switches in each rack uplinked to some number of distribution switches that are uplinked to a pair of core switches.

The first downside that pops out is that you have some amount of oversubscription on the TOR switch uplink. With a 48-port GbE switch in a rack, you may have 1 or 2 10GbE uplinks for either a 4:1 or 2:1 oversubscription rate. With a 6-port 10GbE TOR switch with 1 or 2 10GbE uplinks, you have a 6:1 or 3:1 ratio. By comparison, with a "zone" model, you have full line-rate between all the connections on a single zone switch although the oversubscription rate on the zone switch uplink is likely to be much higher (10:1 or 20:1). Also, the TOR switch uplinks are a large fraction of the cost (especially with 10G uplinks), so there's a natural tendency to want to skimp on uplink capacity. For example, you can save a LOT of money by using 4 bonded 1G uplinks ( or 2 pairs of 4) instead of 1 or 2 10G uplinks.

My conclusion so far is that, if you want to connect servers at 10GbE, you *absolutely* want to go with a TOR switch model. If you need to deliver 48 or most GbE ports per rack, you probably want to go with a TOR model - even though it's a little more expensive - because it avoids a cable management nightmare. If you only need 24-ports (or less) per rack, the "zone" model still probably makes the most sense.

No comments: