Thursday, November 20, 2008

MPLS: The only hammer you'll ever need !

Once you learn how to use the MPLS hammer, you'll suddenly see a million nails you could whack with your shiny new hammer !

We deployed MPLS and MPLS Layer3 VPNs on the IU campus network this past Monday morning. It was VERY anticlimactic ! We cut and pasted some relatively minimal configs into the routers and it all just worked. What is probably the single largest change in how we do networking on campus since the advent of VLANs happened with no one noticing (except Damon and I who were up entirely too early for a Monday morning). Of course, under the covers all kinds of fancy stuff is happening and we now have a powerful new tool in our tool chest !

Weeks before we actually configured the first MPLS VPN on the network (btw- we won't be putting a production system into this MPLS VPN until Dec. 2nd), we already planned to make MPLS VPNs the centerpiece of the network for the new data center in Bloomington ! Your first thought is probably, why the heck would we want MPLS VPNs in the data center network ?

Our current data center network design has "end-of-row" or "zone" switches (layer-2 only) with cat6 cables running to servers in the adjacent racks. The zone switches each have a 10GbE uplink into a distribution switch (again, layer-2 only). The distribution switch has an 802.1q trunk (2x 10GbE) into the core router. This .1q trunk between the distribution switch and the core router has a Juniper firewall in the middle of it - running in layer-2 mode. Those of you who know this setup in detail will know this is not exactly correct and is over-simplified, but the point is the same.

One problem with this design is that, with over 30 VLANs in the machine room, there is a lot of traffic going in and out of the firewall just to get between 2 VLANs in the same room - perhaps between 2 servers in the same row or same rack or 2 virtual servers inside the same physical server. This causes a couple of problems:

1) It adds significant extra load on the firewall unnecessarily in many cases. Think about DNS queries from all those servers...
2) It makes it very difficult to do vulnerability scanning from behind the firewall because the scanners have to be on 30+ different VLANs

The solution to "fix" this is to place a router behind the firewall - ie turn the distribution switch into a layer-3 routing switch. However, if we did this all 30+ VLANs would be in the same security "zone" - ie there would be no firewall between any of the servers in the machine room. This is not good either. For one, we offer a colocation service and virtual server services, so there are many servers that do no belong to UITS. So we don't want those in the same security zone as our critical central systems. It's probably also not a good idea to put servers with sensitive data in the same security zone as say our FTP mirror server. One solution then would be to place a router behind the firewall for each security zone. But of course that gets very expensive....if you want 5 security zones you need 10 routers (redundant routers for each zone).

And this is where the MPLS VPN hammer gets pulled out to plunk this nail on the head !! You use MPLS VPNs to put 5 virtual routers on each physical router and put a firewall between each virtual router and the physical router and your problem is solved. And actually, if you can virtualize the firewall, you can create a virtual firewall for each virtual router and you have 5 completely separate security zones with a pair of high-end routers and firewalls supporting all 5 - all for no extra cost *except* for all the added operational complexity. Those are the costs we need to figure out before we go too crazy whacking nails with our shiny new hammer !!

Sunday, November 9, 2008

2 down, 3 more to go !

We had 5 major network maintenances planned in order to complete the core upgrade project and deploy MPLS VPNs. The first 2 are done: The first was disabling the IS-IS routing protocol (OSPF has been deployed along side IS-IS for some time). This was completed last Thursday. The second was replacing our primary border router (a Juniper M10) with a Cisco 6500. This was completed this morning and was the change that was giving me the most heartburn !

The next change is to swap out the secondary border router with a Cisco 6500 on Tuesday. We'll deploy BGP to all our core routers on Thursday. Currently only the border routers run BGP. BGP is needed on the core routers in order to support MPLS VPNs. The following Monday we will deploy MPLS and our first MPLS VPN.

Friday, November 7, 2008

"Not quite dead yet !"

In case you were worried about my untimely demise, no worries, I'm still alive. I've just been so busy doing that I haven't been writing about what I'm doing :) I'll attempt to catch you up and then will try to get a post out at least once a week from now on.

Wireless:

We deployed about 3,000 Access Points over the summer - roughly an average of 200-250 every week. We also rolled out WPA2 Enterprise (aka 802.1x) during the same timeframe. The majority of the Bloomington Residence Halls have wireless coverage with a few more buildings coming up later this month and around the first of the year. We're now turning ou4 attention to 802.11n to prepare for upgrades next summer. As of yesterday we have 802.11n APs in hand to start testing.

The wireless rollout wasn't without it's bumps, but there were very few user impacting problems. We've been getting a lot of positive feedback from users. When users make a point to call the NOC just to let us know how happy they are with the wireless service, you know it must be going well ! We went out on a limb just a bit by choosing a vendor (HP) that was not a household name in the area of large-scale, controller-based enterprise wireless, but it's worked out extremely well.

Core Upgrade and MPLS VPNs

We also completed the vast majority of the core network upgrade over the summer. The last parts of that upgrade are happening this coming week. We'll be replacing the Juniper M10i Border Routers with Cisco 6500's. That greatly increases the capacity on our Border Routers. As a result, we will be upgrading our primary link to the Internet from 2Gbps to 10Gbps at the same time as the swap out which will happen the day after tomorrow. Once this is completed all our core routers will be Cisco 6500's. Since we had this planned, we had been holding off on deploying MPLS so we didn't have to deal with vendor interoperability issues. Not that this wouldn't have worked with both Juniper and Cisco routers, but this saved us quite a bit of testing. We plan to have our first MPLS VPN live and fully test before the Thanksgiving holiday. This will be the VPN for PCI-DSS systems.

PCI-DSS Compliance

This is really coming together although there is still a lot of work to be done to meet the internal deadline of December 31st of this year. We should be ready to start transitioning system into the PCI-DSS MPLS VPN the week following Thanksgiving. The last network requirement we're still struggling with is 2-factor remote-access. This is just a matter of getting our current Safeword token system working with our Cisco VPN servers. It looks like we may have to wait on an upgrade of the Safeword system, but we're trying to find alternatives because that is not likely to happen before 12/31.

New Data Center

This project is really coming together as well. We're hoping to nail down the final network design for the new data center in a meeting this afternoon. I'll have a post devoted just to the data center network design issues. I think the industry is on the cusp of a major shift in data center networking. Top-of-rack switches are clearly the future in the data center, but products are only just now starting to become available. Fiber Channel over Ethernet is a promising technology, but it's day in the sun is probably still 18-24 months out. Also in the 18-24 month time horizon is 40G and 100G ethernet.

Wednesday, August 6, 2008

I Love My PCI

PCI as in PCI-DSS as in Payment Card Industry Data Security Standards

We met with a QSA on Monday. Don't me what QSA stands for - their the official PCI auditors. The killer statement from the meeting was that every network device we manage which forwards a packet with payment card data in it - even if that data is encrypted - is within scope for PCI compliance. My understanding is that this means that requirements like regular password rotation, quarterly config reviews, and file integrity monitoring apply to all out network equipment. We run a very secure network, but security != compliance so we will end up spending a lot of time dotting our I's and crossing our T's. And a lot more time showing auditors that we dotted and crossed !

Tuesday, July 29, 2008

iPhone + Streaming Radio

Okay, this is not really about networking or IU, but I thought it was pretty cool so I figured I'd share it with all of you (which hopefully includes a few more people than I've already told this to in person). *AND* it did involve 1 piece of network equipment owned by IU, so....

Like many people, I'm amazed by many of the 3rd party applications for the iPhone. I was very busy preparing for the Joint Techs workshop last week, so I didn't have much time to "play" with all the new applications for my iPhone. I did, however, download the AOL Radio application a couple of days before leaving for Lincoln. It worked fairly well and I quickly thought it would be quite cool if I could use it in my car while driving ! I'm too cheap to pay for satellite radio, so the idea of being able to listen to radio stations from all over the country in my car caught my eye !

Of course, the first thing I thought was *DOH* - what about that darn GSM interference ? All that buzzing and popping coming through the radio from the streaming audio over the EDGE network wouldn't do. Luckily, I've been testing a Linksys Mobile Broadband router with a Sprint EV-DO card. So I could plug this into the power outlet in my trunk and connect my iPhone to it via Wifi. Note: with iPhone 2.0 release, you can put the iPhone in "airplane mode" - shutting down the cellular radio - and then enable the Wifi radio :) Problem #1 solved ! BTW- I've been told that HSDPA (AT&T's 3G technology) does not have the same interference issues, but alas I don't have one to test with :-(

The next problem was that Sprint doesn't have 3G in Bloomington yet. So how well would this work over the "slow-as molasses" 1xRTT network ?

Before I left for the airport, I tossed the Linksys into my trunk (not literally) and plugged into the power outlet. I dropped (again not literally) my iphone into the dock in my car and headed out. Shortly after I passed the Bloomington bypass on highway 37, I fired up AOL Radio to see what would happen. The station started, but the audio was in and out, stopping and starting --- unusable :-( I turned it off and went back to listening to a podcast. When I reached Martinsville - safely within Sprint's EV-DO coverage - I tried it again -- tuning into the Jack FM station in Chicago. This time it worked fairly well. Every few minutes there would be a short audio drop as it rebuffered, but all-in-all it worked reasonably well.

While I was in Lincoln, I had some free time to play my iPhone. I downloaded a bunch of 3rd party apps include Pandora. For those of you who haven't used Pandora, it's a personal radio station application. You pick an artist and they select songs from that artist and other similar artists. You can give songs a thumbs up or thumbs down and it supposedly adjusts to your tastes.

While in Lincoln, I used Pandora over the EDGE network from my hotel room and walking around town. I was amazed by how well it worked over the EDGE network. Excellent sound quality and almost no rebuffering. I couldn't wait to try it out on the drive home from the airport.

So, last Thursday night while driving home from the airport I tried it out. Amazing ! The quality over both EV-DO and 1xRTT networks was excellent ! Presumably it would be just as good using the cellular radio internal to the iPhone - assuming there wasn't a GSM interference issue. I've been using it for the past several days and have been amazed at how well it works - even down by my house in the southern part of the county where there are definitely some dead spots !

If I ran a satellite radio company, I'd definitely be paying attention to this. It seems to me the major cost for the satellite radio companies is transport - ie getting the signal from the head-end to the users. The reason people want satellite radio is the large selection of content that is available anywhere - not just within your local broadcast area. Exchanging satellite transport for IP transport (either over wired or wireless networks) could drastically reduce their costs and increase their availability - ie you can get IP-based connection in places you can't easily get satellite - like in basements !

Wednesday, July 23, 2008

Internet2 Joint Techs

I'm at the Internet2 Joint Techs Workshop in Lincoln Nebraska this week. The primary reason I'm attending is actually for 2 events that were "tacked-on" to the main workshop: The MPLS Hands-On Workshop on Sunday and the Netguru meeting today and tomorrow.

The MPLS workshop was a 1 day workshop meant to educate campus network engineer about MPLS and it's application on campus networks. The morning was spent on presentations and the afternoon on hands-on configuration of MPLS in a lab setting. This was the first MPLS workshop and it went extremely well. There were 22 people in attendance. I was an instructor for the workshop and gave about a 1 hour talk on the control-plane for MPLS VPNs. I plan to reuse the material to provide some MPLS instruction for the networking staff at IU.

The second event I'm attending is the Netguru's meeting. Netguru is a small group of network architects from universities around the country. As you might imagine, campus network architects often have lots of challenging problems they're trying to solve and find it very helpful to discuss these with other people who are facing the same challenges. I think it's typical for these folks to have 1 or 2 network architect friends that they discuss issues with on a fairly regular basis. A few years ago I shared a cab ride to an airport with David Richardson and Mark Pepin. David and I got together to discuss networking issues on a fairly regular basis - whenever we were in the same city (David worked at the Univ. of Washington before leaving to work for Amazon). We somehow started talking about how network architects share information and Mark Pepin brought up the idea of starting a small group (10-15 people) of network architects that met in conjunction with the I2 Joint Techs workshop to discuss issues of the day. Thus Netguru was born ! We have a full agenda for this afternoon, dinner tonight and all day tomorrow. I've missed the last 2 meetings, so I'm looking forward to the discussions today and tomorrow.

Thursday, July 17, 2008

Catching up (again)...

Well, it's been 3 weeks since my last post, but I assure you we have not been sitting around twiddling our thumbs ! Here's a summary of what's been going on...

The wireless and core upgrade projects are moving along smoothly. About 1,000 of the 1,200 APs in Bloomington have been replaced. We're also starting to complete some of the dorms in Bloomington as well - so some of the dorm rooms will have wireless by the start of the fall semester. At IUPUI, we're not quite as far along as in Bloomington, but will have completed wireless upgrades in all the on-campus buildings by the time the UITS change freeze goes into effect on August 18th.

We're finishing up the preparations for adding the "IU Guest" SSID to all the APs. This will be the SSID guests who have been given Network Access Accounts will use to access the network. This will allow us to shutdown our old web portal authentication system. The system has a scaling limitation related to the number of MAC addresses on wireless and we've been putting band-aids in place for 2 years to get it to scale to the number of wireless users we have. The "IU Guest" SSID will use the web-portal authentication built-in to the HP WESM modules - these do not have the same scaling limitations.

With these projects moving along smoothly, Jason and I have shifted our attention to the *next* set of projects. Here's a bit about what we've been up to...

We spent a day at IU-Northwest talking with them about the major network upgrade they're planning. During the next 12 months they'll be upgrading all their wiring to Cat6e, consolidating IDFs, improving their outside fiber plant, upgrading all their switches to HP5400's, and deploying over 150 new 802.11n APs.

Jason spent a day at IU-Kokomo helping them setup their new HP wireless gear and discussing their future use of HP's Identity Driven Management product. IU-Kokomo undertook a major upgrade of their network earlier this year, replacing all their switches with HP 5400's, and as part of that they purchased HP's Identity Driven Management system. I could devote a whole post just to this (and probably will eventually), but essentially this is a policy engine that let's you decide when and where users can connect to your network and what type of network service they get - which is done by placing them on different VLANs or applying ACLs to their connection. We've been interested in getting our feet wet with a system like this for some time and Kokomo has agreed to be a guinea pig of sorts :) Thanks Chris !

We had our yearly retreat with the IT Security Office - now called the University Information Security Office. This is something we've been doing for a few years now. A couple people from ITSO and a couple people from Networks get together off-campus and spend several hours thinking strategically about improving security - instead of the tactical thinking we usually do. Tom Zeller hosted the event again - Tom has a large screened in porch in the woods and we were able to watch some wildlife in addition to discussing security !

We met with the University Place Conference Center staff at IUPUI to discuss their unique wireless and guest access needs. They have web-portal authentication on both their wireless network and their wired network. The new web-portal system on the HP WESMs only works for wireless users, so when we upgrade wireless in the hotel and conference center, we'll have to do a bit of a one-off for them.

I've been very busy preparing for the upcoming MPLS Workshop at the Internet2 Joint Tech's workshop in Lincoln, Nebraska. MPLS VPNs are becoming a hot-button topic for campuses as they struggle to meet the divergent networking needs of their different constituents - from the business aspect of the university, to student housing, to researchers. In fact, we're planning to roll-out MPLS VPNs this fall, so when I was asked to be an instructor for this workshop, I figured it would be a great opportunity to sharpen my skills on MPLS VPNs *AND* I could reuse the materials I develop to provide training for all the UITS networking staff that will need to learn how to support MPLS VPNs ! As part of this process, I put together a small MPLS testlab with 3 routers and, when I return, will use this to start preparing for our MPLS VPN deployment.

We've also continued to develop our plans for networking in the new data center. I'll share some more about later once I get past the Joint Tech's workshop in Lincoln !

Friday, June 27, 2008

Data Center - Take 2

Shortly after my last post - actually it was the next morning in the shower, which is where I do my best thinking - I was hit by the thought, "doesn't CX4 reach 15 meters instead of 10?". So when I got to work that morning, I looked it up and, sure enough, 10GBASE-CX4 has a 15 meter (49 foot) reach. And that, my friends, makes all the difference !

15 meter reach makes it possible to use CX4 as the uplink between all the TOR switches in a 26 cabinet row and a distribution switch located in the middle of said row. This reduces the cost of each TOR switch 10GbE uplink by several thousand dollars.

The result is that, for a server rack with 48 GbE ports, it's substantially less expensive to deploy a TOR switch with 2 10GbE (CX) uplinks to a distribution switch than to deploy a two 24-port patch panels with preterminated Cat6e cables running back to the distribution switch. For server rack with 24 GbE ports, it's a wash in terms of cost - the TOR switch option being a few percent more expensive. This also means that the cost of 10G server connection is significantly lower than I originally calculated.

The only remaining issue is that, in the new data center, the plan was to distribute power down the middle aisle (13 racks on each side of the aisle) and out to each rack, but to distribute the fiber from the outsides of the rows in. One thing that makes the TOR model less expensive is that you only need 1 distribution switch per 26 racks (13 + 13) whereas with the patch panel model you'd need multiple distribution switches on each side of the aisle (2 or 3 switches per 1/2 row or 13 racks). But having only 1 distribution switch per row means that there would be CX4 cables and fiber crossing over the power cables running down the middle aisle. We have 36" raised floors though, so hopefully there's plenty of vertical space for separating the power cables and network cables.

The other consideration is that it appears to me vendors will be converging on SFP+ as a standard 10G pluggable form-factor - going away from XENPACK, XFP and X2. If this happens, SFP+ Direct Attach will become the prevalent 10G copper technology and that I believe does only have 10 meter reach. That would lead us back to placing a distribution switch on each side of the aisle (1 per 13 racks instead of 1 per 26 racks) - which will raise the overall cost slightly.

Tuesday, June 24, 2008

Networking the Data Center

As if we had nothing else to do this year, we're busy planning for our new data center in Bloomington that will come on-line in the spring of 2009. I spent the better part of the afternoon yesterday working through a rough cost analysis of the various options to distribute network connectivity into the server racks, so I thought I'd share some of that with all of you. I'll start with a little history lesson :)

The machine rooms originally had a "home-run" model of networking. All the switches were located in one area and individual ethernet cable were "home-run" from servers directly to the switches. If you're ever in the Wrubel machine room, just pick up a floor tile in the older section to see why this model doesn't scale well ;-)

When the IT building @ IUPUI was built, we moved to a "zone" model. There's a rack in each area or "zone" of the room dedicated to network equipment. From each zone rack, cables are run into each server rack with patch panels on each end. All the zone switches had GbE uplinks to a distribution switch. We originally planned for a 24-port patch panel in every other server rack - which seemed like enough way back when - but we've definitely outgrown this ! So, when we started upgrading the Wrubel machine room to the zone model, we planned for 24-ports in every server racks. 24-ports of GbE is still sufficient for many racks, but the higher-density racks are starting to have 48-ports and sometimes 60 or more ports. This is starting to cause some issues !!

But first, why so many ports per rack ? Well, it's not outrageous to consider 30 1RU servers in a 44 RU rack. Most servers come with dual GbE ports built-in and admins want to use one port for their public interface and the second for their private network for backups and such. That's 60 GbE ports in a rack. - OR - In a large VMware environment, each physical server may have 6 or 8 GbE NICs in it: 2 for VMkernel, 2 for console, 2 for public network and maybe 2 more for private network (again backups, front-end to back-end server communications, etc). 8 NICs per physical server, 6 or 8 physical servers per rack and you have 48 to 64 GbE ports per rack.

So, why doesn't the zone model work ? In a nutshell, it's cable management and too much rack space consumed by patch panels. If you figure 12 server racks per "zone" and 48-ports per rack, you end up with 576 Cat6e cables coming into the zone rack. If you use patch panels, even with 48-port 1.5RU patch panels, you consume 18 RU just with patch panels. An HP5412 switch, which is a pretty dense switch, can support 264 GbE ports in 7 RU (assuming you use 1 of the 12 slots for 10G uplinks). So you'll need 2 HP5412s (14 total RU) PLUS an HP5406 (4 RU) to support all those ports. 18 + 14 + 4 = 36 - that's a pretty full rack - and you still need space to run 576 cables between the patch panels and the switches. If you don't use patch panels, you have 576 individual cables coming into the rack to manage. Neither option is very attractive !

Also, if you manage a large VMware environment, with 6 or 8 ethernet connections into each physical server, 10GbE starts looking like an attractive option (at least until you get the bill ;-). Can you collapse the 8 GbE connections into 2 10GbE connections ? The first thing that pops out when you look at this is that the cost to run 10GbE connections across the data center on fiber between servers and switches is simply prohibitive ! 10GBASE-SR optics are usually a couple grand (even at edu discounts), so the cost of a single 10GbE connection over multimode fiber is upwards of $4,000 *just* for the optics - not include the cost of the switch port or the NIC !

For both these reasons (high-density 1G and 10G) a top-of-rack (TOR) switch model starts looking quite attractive. The result is a 3-layer switching model with TOR switches in each rack uplinked to some number of distribution switches that are uplinked to a pair of core switches.

The first downside that pops out is that you have some amount of oversubscription on the TOR switch uplink. With a 48-port GbE switch in a rack, you may have 1 or 2 10GbE uplinks for either a 4:1 or 2:1 oversubscription rate. With a 6-port 10GbE TOR switch with 1 or 2 10GbE uplinks, you have a 6:1 or 3:1 ratio. By comparison, with a "zone" model, you have full line-rate between all the connections on a single zone switch although the oversubscription rate on the zone switch uplink is likely to be much higher (10:1 or 20:1). Also, the TOR switch uplinks are a large fraction of the cost (especially with 10G uplinks), so there's a natural tendency to want to skimp on uplink capacity. For example, you can save a LOT of money by using 4 bonded 1G uplinks ( or 2 pairs of 4) instead of 1 or 2 10G uplinks.

My conclusion so far is that, if you want to connect servers at 10GbE, you *absolutely* want to go with a TOR switch model. If you need to deliver 48 or most GbE ports per rack, you probably want to go with a TOR model - even though it's a little more expensive - because it avoids a cable management nightmare. If you only need 24-ports (or less) per rack, the "zone" model still probably makes the most sense.

Wednesday, June 18, 2008

Distributing Control

One thing I've done quite a bit of since taking on the network architect role last summer is meet with LSPs to discuss their networking needs. Just yesterday we met with the Center for Genomics and Bioinformatics, this morning we're meeting with the Computer Science department, and Friday with University College @ IUPUI. What I've learned is that there are many excellent LSPs and that they know their local environment better than we ever will.

As the network becomes more complex with firewalls, IPS', MPLS VPNs and such, I think we (UITS) need to find ways to provide LSPs with more direct access to affect changes to their network configurations and with direct access to information about their network. For example, if an LSP knows they need port 443 open in the firewall for their server, what benefit does it add to have them fill out a form, which opens a ticket, which is assigned to an engineer, who changes the firewall config, updates the ticket and emails the LSP to let them know it's completed ?

Okay, it sounds easy enough to just give LSPs access to directly edit their firewall rules (as one example) - why not just do this ?

First, you have to know which LSPs are responsible for which firewall rules. To do that you first need to know who the "official" LSPs are, but then you also need to know which IP addresses they're "officially" responsible for. It turns out this is a pretty challenging endeavor. I've been told we now have a database of "authoritative" LSPs that is accomplished by an official contact from the department (e.g. dean) designating who their LSPs are. But then you need to associate LSPs with IP addresses - and doing this by subnet isn't sufficient since there can be multiple departments on a subnet. The DHCP MAC registration database has a field for LSP, but that only works for DHCP addresses and is an optional user-entered field.

Second, you have to have a UI into the firewall configuration that has an authentication/authorization step that utilizes the LSP-to-IP information. None of the commercial firewall management products I've seen address this need, so it would require custom develop. The firewall vendors are all addressing this with the "virtual firewall" feature. This would give each department their own "virtual firewall" which they could control. This sounds all fine and good, but there are some caveats.... There are limitations to the number of virtual firewalls you can create. If you have a relatively small number of large departments, this is fine, but a very large number of small departments might be an issue. Also, one advantage of a centrally managed solution is the ability to implement minimum across-the-board security standards. None of the virtual firewall solutions I've seen provide the ability for a central administrator to set base rules for security policy that the virtual firewall admins cannot override.

Third, it is possible to screw things up and, in some rare cases, one person's screw up could affect the entire system. High-end, ASIC-based firewalls are complex beasts and you should really know a bit about what you're doing before you go messing around with them. So would you require LSPs to go through training (internal, vendor, SANS, ?) before having access to configure their virtual firewall ? Would they have to pass some kind of a test ?

I don't think any of these hurdles are show-stoppers, but it will take some time to work through the issues and come up with a good solution. And this is just one example (firewalls) of many. Oh, and people have to actually buy-in to the whole idea of distributing control !

Where’s Matt ?

Well, the last two weeks were very hectic on a number of fronts and I didn’t get a chance to post.

The Friday before last was the networking “all-hands” meeting. This was a meeting for *everyone* in UITS that is involved in supporting networks. My back-of-the-envelope head-count was up over 70, but with vacations, 24x7 shift schedules and whatnot, we ended up with about 40. I babbled on for a solid 90 minutes about the draft 10-year network plan, the discussion and work that went into developing it, and how that will translate into changes over the next year or so. After questions, we were supposed to have some great cookies and brownies, but much to my dismay, the caterers showed up an entire hour late - after most people had left.

After the snackless break, we asked everyone who had a laptop to come back for some wireless testing. We had 3 APs setup in the auditorium and during the presentation had done some testing to see how well clients balanced out across the APs (not very well as we expected) and what throughput/latency/loss looked like with certain numbers of users on an AP. The dedicated testing was all done with 1 AP for all users. Using speed tests and file downloads, we tried to take objective and subjective measurements of performance with different numbers of users associated with that AP (10, 20, & 40). The goal was to set a cap on the number of users per AP that, when reached by a production AP, would be used to trigger a notification so we proactively track which locations that are reaching capacity.

I spent last week out in Roseville California meeting with HP ProCurve. I don’t know about you, but trips to the west coast just KILL me ! Doesn’t matter what I do, I always wake up at 4am (7am Eastern) and, inevitably, my schedule is booked until 9-10pm Pacific. The meeting was excellent and useful - although you’re not going to get any details here because of our non-disclosure agreement.

Okay, now throw on top of this that I’ve had multiple contractors tearing up my house for the last 2 weeks, and it’s been NUTS !

Thursday, May 29, 2008

Status Update: 05/29

We're in the middle of our 2 weeks of "downtime" before the full-scale deployment starts. So far everything is running fairly smoothly. We did a fail-over test of a primary WESM controller with 151 APs associated with it and that went very well. It's summer so of course, although we have 200+ of the new APs deployed, the total number of simultaneous users is still quite low. We won't truly stress the system until fall classes start.

This Friday Jason and I are doing a Tech Talk for LSPs on the new wireless system. I'll get an overview of the project, and probably the core upgrade project as well, and Jason will cover the details that LSPs will need to know.

Next Friday we're having a "all-hands" meeting of the networking staff. The idea is to present the 10-year network master plan to everyone supporting the network from the installers to the IP engineers, so they understand how this will impact them and what role they will play. There will be a particular focus on the wireless and core upgrade work this summer, the MPLS VPN deployment this fall and support for IPTV and Voice/IP. We're also going to use this meeting as an opportunity to do some performance testing of the new wireless system. We'll be in a fairly large auditorium and should have 60-70 people. We're going to do thing like setup 3 APs and watch how clients balance across the 3 (or not) and put all the uses on a single AP to see how many users and how much bandwidth a single AP can support.

The deployment schedule is completed with approximately 200 APs being installed every week, starting June 9th and wrapping up on August 1st.

Thursday, May 22, 2008

We're on the down slope...

...or at least it feels like we are ! If you couldn't guess from my posts, last week was a tough week for us. But the team put in a lot of hard work and preparation and our first full week of wireless upgrades is going very smoothly ! (He says as he raps hardily on a nearby wooden table)

We upgraded Geology and Geological Survey on Monday, Psychology and Business Grad on Tuesday, Business (and SPEA Library) on Wednesday and they'll start connecting new APs in SPEA and Informatics in another hour or two. The Wells Library will be upgraded tomorrow, then we'll take a couple of weeks off to let the dust settle and resolve any issues before continuing the upgrades.

We did have a couple of small hiccups this week - both of which were resolved quickly. In Business, some of the power injectors that supply power on the ethernet cables running to the APs were older models that only supported "pre-standard" Power over Ethernet. The new APs do not support this pre-standard version, so we had to install some standard 802.3af power injectors quickly on the morning of the change over. Also, we discovered 2 APs in the SPEA Library who's datajacks are connected to an IDF in the Business School. This meant the old APs there went down yesterday morning when we upgraded the Business School instead of this morning when we're upgrading SPEA. We were able to get new APs installed there fairly quickly to get service back up. For those who don't know, the 2 buildings are joined at the hip and apparently some of the jacks in the SPEA building are too far away from the SPEA IDF, but close enough to an IDF in the Business School. Considering we have almost 1,000 IDFs and 60,000 data jacks, these minor oversights will happen every now and then.

We also ran into a minor bug on one of our WESM controllers yesterday (nothing service impacting). We got word last night that HP engineers were able to reproduce the bug and we're waiting to hear when the fix will be available. Thanks goes to our HP TAM (Technical Account Manager) for helping us pin this down quickly !

Friday, May 16, 2008

14 Down - Only 4,786 to go !!



This morning we completed the core upgrade transition and wireless upgrade for essentially 2 buildings - the UITS Comm Services building and WCC. The team put in 3 very long days getting ready, but it was definitely worth the effort as the changes this morning went off without a hitch (well, almost) !!

The 2 things we caught this morning were that the WESM's (aka controllers) had redundancy configured, but it was not enabled. Also, the radio port adoption settings had the right default power levels, but were set for random channel assignment instead of automatic (intelligent) channel assignment. I caught both of these before they started connecting the new APs, so they all came up with the right settings. Charlie pushed these config changes to the other 10 WESMs, so we're set to go.

On another topic, if you ever wondered what 348 wireless access points looked like, that's what you're looking at up above. They come 12 in a carton and there are 29 cartons there. Now just imagine what 4,800 would look like :) This is why we're trying to pace the delivery - otherwise we'd need a LOT of storage space !!

Thursday, May 15, 2008

The Good, the Bad, and the Ugly

Unfortunately yesterday dealt us mostly the latter two !! After 12 years in the networking field, I've found some days everything seems to just go your way and some days...well...some days everything that can go wrong does and you wish you'd just stayed in bed !

For example, what's the chance of a compact flash card - one that worked perfectly fine all day long the day before - being corrupted when you try to boot a switch from it the next morning ??!! What's the chance of an upgrade to one switch causing another switch across campus to crash - repeatedly ??!! Sometimes even when you've taken every preparation possible, things just don't go your way ! The only consolation after a day like that is that the next one HAS to be better !

Looking on the bright side though, when we finally left that office about 8pm last night, 3 or the 4 core switches had been upgraded and 2 of them had their routing configuration about 75% complete. If you remember, we're converting our layer-2 only aggregation switches into layer-3 switches that will be the default gateway routers for the subnets on campus. Once the global routing configuration is completed on these switches and tested, we will transition the routing for each building to these switches. This routing transition for each building needs to happen first thing in the morning on the day the wireless in the building is upgraded. Therefore, the upgrade of these switches and the configuring of routing on them is a prerequisite for starting the wireless upgrade.

We should have the routing configuration completed this morning and routing for the VLANs that support the network engineers transitioned this morning. We will test throughout that day to make sure everything is working properly. The next step is to transition routing for the UITS buildings in Bloomington tomorrow morning. This will be the first test of the process of converting a building's routing followed by the wireless upgrade of the building and will give us a chance to work out kinks in the process before we start on the first 8 "pilot" buildings next week. By the time we finish the first 8 pilot buildings next week, we should have the process running like a well oiled machine !

Friday, May 9, 2008

Those 3 Magic Words

OUT FOR DELIVERY

I don't know about you, but when I'm waiting for that next cool gadget to arrive in the mail, those 3 words always trigger a little burst of excitement...sort of like Christmas morning when I was 6 years old :) But today it's 255 new little gadgets !

Thursday, May 8, 2008

The APs are coming, the APs are coming !!

That's right, we just received confirmation that our first 255 Access Points arrived at Fedex in Columbus Indiana just a couple of hours ago. We'll take a few of these APs to install in all the UITS buildings later next week and another 190 of them to install in 8 buildings in Bloomington starting on May 19th. We also received 14 WESM modules (aka controllers) earlier in the week, so we have enough hardware for the complete 8 WESM deployment at IUPUI and 1 of the 2 12 WESM deployments at IUB.

In case you're interested in the technology, the WESM modules can be deployed in groups of 12 (or less). Wireless clients can "roam" seamless across all the APs that are associated to any of the 12 WESMs in the "Mobility Group". At IUPUI, we will initially have 8 WESM modules (some of which are for redundancy purposes only) in a Mobility Group to support the 600 APs we plan to deploy there. At IUB, we will have 24 WESM modules (again, some are just back-ups) in 2 separate Mobility Groups to support the 4,200 APs we plan to deploy there.

Tuesday, May 6, 2008

Oops, forgot to mention the core upgrade

The core upgrade project is moving along smoothly as well. We received the last few bits of the equipment yesterday - a shipment of XENPACK modules - so we have everything we need to move full steam ahead. Thanks to some hard work over the weekend, all of the new 24-port and 48-port gigabit ethernet modules are installed in the core switches in Bloomington.

This Thursday and Friday they'll be swapping out the Supervisor 720a cards for Supervisor 720bxl cards in the core switches. Essentially this will allow us to turn the core switches into full blown routers with MPLS capabilities. Once this is completed, we'll connect the core switches up to the backbone, configure routing and prepare for migrating the routing function for all the campus buildings to these switches. These transitions will happen building by building as we upgrade the wireless equipment in the buildings. Once we transition the routing for all buildings to the core switches (ie complete the replacement of all existing wireless APs), the core upgrade will be complete. Sounds easy, right ?

Checking in

I know....it's been a whole week since my last post...I'm a slacker !

Jason and I attended a meeting of the CIC schools in Chicago last week to talk about wireless. For those of you who don't know, the CIC is *roughly* the BigTen Conference schools. It was great to hear what other schools are doing and to share information and ideas. One thing was definitely clear from the meeting - there is no perfect controller-based wireless product. No matter which controller-based wireless vendor you choose, you'll run into bugs, limitations and short-comings. Heck, that's true for any network equipment !!

Yesterday we received 16 of the WESM controllers. This brings the total to 22 which is enough for the entire IUPUI installation (8), 1 of the 2 mobility groups at IUB (12), and a couple of WESMs to test with. We started yesterday getting these installed in the switches and configured. Now we're just waiting on the shipment of APs that are due to arrive early next week.

Today our Technical Account Manager (TAM) from HP is in town for our first meeting. He will be our primary support contact and will know our network well so we don't have to bring a support person up to speed on what we're doing every time we open a case. Given the size (4,800 APs) and the short timeframe (3-4 months) of our deployment, having a dedicated support contact that know our network will be extremely important !

Monday, April 28, 2008

That Other Big Project !

I've alluded to our other big summer project a couple of times, but now I'm finally going to give you the low-down on it !

When I returned from vacation in early January, I fully anticipated that the core of our network would remain fairly static until the summer of 2009. This is when, according to the 10-year network plan we had been developing (a topic for another post), we would do a "fork-lift" upgrade of the core. Since that was only 16 months away, I set to work developing requirements, scheduling initial discussions with vendors, etc. As part of this process, I also started meeting with various departments and groups to get a little better handle on what they would need - in addition to the UITS projects I already knew the network would have to support such as VoIP and IPTV. What I learned over the course of January and early February led to a slight change of plans :)

One of the important things I learned is that there are several departments looking to take advantage of the high-capacity, high-bandwidth networked storage systems we've deployed over the last 2 years. Our MDSS tape storage system now has 24 10GbE connected "front-end" servers each capable of moving data in and out of the system at around 3-4Gbps. The Data Capacitor is a disk-based storage system that also has 24 10GbE connected servers each capable at moving data between the IP network and disk at around 7Gbps. I met or spoke with at least 4 departments during January that were all looking to move large data sets between their buildings and these central storage systems at very high bandwidths.

Our current architecture aggregates all the 1GbE connections from the buildings into layer-2 ethernet switches, applies 802.1Q VLAN tags and trunks all those VLANs over 10GbE links to the routers. This architecture provides a lot of flexibility and works fine for large numbers of 1GbE connected buildings using a few hundred Mbps of bandwidth each, but not so well for a dozen 10GbE buildings bursting up to 3-4Gbps each. In addition, those layer-2 ethernet switches are also out of empty ports and modules.

The solutions to this are:

(1) Move the layer-3 routing function onto the aggregation switches that terminate the fiber connections to the buildings. This removes the bottleneck between the layer-2 aggregation switches and the layer-3 routing switches. It also frees up quite a few 4-port 10GbE modules that can be reused to support 10GbE connections to buildings.

(2) Upgrade the 16-port 1GbE modules to 24 or 48 port 1GbE modules. This frees up slots in the aggregation switches Inow layer-3 switches) to install the 4-port 10GbE modules to support 10GbE connections to buildings.

The other big thing that I learned about in January was PCI-DSS !! For those of you who haven't heard about PCI-DSS, that stands for Payment Card Industry - Data Security Standards. Think HIPAA on steroids for merchants that accept credit/debit cards :) PCI-DSS has a laundry list of network and processes requirements that must be met in order to be compliant.

As I dug deeper into what it would take to support the PCI-DSS requirements, it became clear (to me at least) that MPLS Layer-3 VPNs was the way to go. We had already been discussing MPLS VPNs for a while and several other universities have already deployed MPLS VPNs to solve problems like this. The general problem is that there are many different groups (or groups of systems), that each have unique network requirements and that have users/machines spread across many different buildings on campus. In addition to PCI-DSS compliant systems, you have building control systems (e.g. HVAC, security cameras, door access systems, etc), IP phones, and School of Medicine and Auxiliary Services that supports users/systems across many buildings. In a nutshell, MPLS Layer-3 VPN allows you to "group" these systems into separate virtual routers, each of which can have different network services and policies (firewall, NAT, IPS, etc).

The Cast of Characters

I thought I should introduce some of the people working on the project so, when I say Jason did this or Dwight did that, you'll know who I'm talking about !

There's a core engineer group that is working through the myriad of engineering issues involved in getting the project from the RFP stage to the full-scale deployment phase. This is by no means a complete list of people working on the project !! A deployment of scale involves *many* people from all parts of the organizations !!

Ed Furia, who rose to fame as part of the video group, is the project manager for the wireless project. Ed's also involved in a lot of the engineering work, especially related to WPA2 Enterprise. Ed's done an excellent job of getting up to speed on the project in just a few weeks !

Jason Mueller, who hails from Iowa and the University of Iowa, started his tenure at IU just 5 short (or long) weeks ago. Jason has a seriously "mad" wireless skills - (How's that for modern pre-teen lingo!) - and really great experience from deploying Iowa's wireless network.

Dwight Hazen and Charlie Escue round out the group and have loads of experience and great ideas !

Tuesday, April 22, 2008

Equipment is rolling in...

It's always an exciting part of a project when those emails titled "packing at the dock" start rolling into my inbox :)

Last Friday a small box with about 80 LX SFPs arrived. These are for the upgrade of the core switches we're working on (I promise I'll put up a post describing what we're doing there soon). Yesterday I got a "package at the dock" email that said there were 16 boxes form Cisco - WOO HOO !!! This got my hopes up that the new Cisco 6500 interface cards we're waiting on started showing up early - which would be awesome ! Hans was nice enough to run over to the dock for me (remember, I'm hanging out in northern Virginia) only to find it was only the daughter cards that attach to the interface cards :-( So we'll enter them into the inventory DB and put them in the storage room and wait *patiently* for the cards they mate to. The new server hardware to upgrade the RADIUS servers showed up yesterday too !

Keep it coming !

Monday, April 21, 2008

Update from the road

I know you're all dying to hear what's been going on...and a LOT has happened since my last post on Thursday morning. I'm at the Internet2 Member's Meeting in Arlington, VA this week and will try to make use of what free time I have to check up again.

We met with the Messaging team last week to discuss the impact of the wireless project on DHCP and ADS. The biggest issue perhaps is the need to configured DHCP option 189 on the subnets the APs are on. Option 189 can pass up to 3 IP addresses to the APs which is how the APs figure out which controllers to associate with. The APs hold no configuration through reboot. Each time they boot, they will learn the IP addresses of the primary and backup controller via DHCP option 189 and will contact the controller to get their configuration.

The engineering team met on Friday morning. The primary topic was nailing down a tentative schedule, especially for the early part of the deployment. We plan to allow 1-2 weeks of testing after the equipment arrives. Then we want to deploy around 200 APs and let them "burn-in" for a couple of weeks before starting deployment in earnest. In addition to testing, these first 200 APs (about 7 buildings) will give us a chance to document and verify the deployment procedures, so we can move quickly and smoothly with the remaining buildings. We're tentatively scheduling the first 7 buildings during the week of May 12th with full-scale deployment starting the first week of June.

Thursday, April 17, 2008

But what about NAT ?

Okay, I know all the smart kids out there are screaming, "But why
aren't you using NAT?" The short answer is *time* - or the lack
thereof. The HP WESMs (Wireless Edge Services Modules) do have NAT
support built-in. We could also use an external firewall placed in
front of the WESMs to perform NAT - or rather PAT since we really want
all the wireless users to shared a small pool of public IPs. We also
realize that, as the number of simultaneous wireless users grows
extremely large (say more than 16,000) and as our overall pool of
unused IP blocks dwindles, we will absolutely need to consider NAT on
wireless in order to conserve public IPv4 addresses.

HOWEVER ! We also need to deploy a few thousand APs in the next 2-3
months *AND* roll-out WPA2 Enterprise *AND* roll-out a new guest
access portal. Oh, and we have this other little project to
completely overhaul the core of the network and deploy MPLS VPNs
before August (I'll dive into that project in future posts). SO,
since we have an unused /16 block at our disposal, we think that's the
best course of action. We won't allow incoming TCP connection (no
wirelessly connected servers) and wireless clients are transient by
nature, so switching to NAT later on should be fairly painless - well,
for users at least :)

Where have all the IPs gone....

...to all the wireless users, of course ! Several weeks ago I went
looking for IP subnets to assign to our new wireless SSIDs. What I
discovered is that we do NOT have the vast amounts of unused IP space
we once thought.

But first things first... the first step was to move our IP allocation
documentation from spreadsheets and flat files into a database.
Fortunately, we already developed a nice IP allocation database for
our support of networks like Internet2 and NLR, so we had a database
ready to go. Now that all our IP allocations for all our campuses
are documented in a single place, we can look at overall IP
utilization, delegate authorization to allocate addresses from
specific IP blocks, and do better planning of our IP allocations.
This will become very important as our IPv4 address space becomes more
scarce !

Once I started looking at this, I found that, especially in
Bloomington, we don't have a whole lot unused subnets and especially
not contiguous subnets. And it turns out a LOT of these are eaten up
by wireless users !

According to our monitoring software, we are seeing about 5,000
simultaneous wireless users in Bloomington these days. However, our
DHCP lease timers are in the 90-120 minute range [see note below].
So if someone uses wireless for 10 minutes and then shuts their
laptop, their IP address is reserved for another 80-110 minutes.
This means we actually have about 10,000 total host IP addresses
assigned to our wireless subnets ! That's 1/6th of a whole /16 or
legacy Class B block. But it gets worse !! Since users must use
VPN to get full access to wireless, most of these users are also
consuming an IP in the VPN address pool. So we have several thousand
more IPs assigned to those pools for a total of nearly 16,000 host IPs
assigned for wireless users. That's 1/4th of an entire /16 or Class
B !!!

Note: On DHCP lease timers, we'd love to decrease them, but there's an
issue with some VPN clients that, when they have a VPN connection,
they don't renew their lease properly because they send DHCP packets
improperly over the VPN tunnel instead of to their local subnet, so
when their DHCP lease expires they loose their network connection
until the VPN tunnel drops and they renew their lease over their local
subnet. We used to have shorter lease times, but many users
complained that their VPN connections kept dropping in the middle of
meetings and they would have to reconnect. This won't be an issue on
the new WPA2 Enterprise SSID !

Even with shorter lease times on the WPA2 Enterprise network, given
the level of growth we're seeing in wireless usage and all the new
wireless clients from the expansion into the dorms, we think we need
to at least allocate 16,000 host IPs to the new wireless network.
Since we can't reclaim the IP space from the current wireless network
until users transition to the new one, we need to come up with a new /
18 block of IPs. The *ONLY* block we can take this from is
140.182.0.0/16 which is the last unused Class B network we have.
Since we've never used this block, we need to give ample warning to
all system administrators incase they have host firewalls that need to
be updated. And THAT, my friends, is at the top of my to-do list for
today !

Wednesday, April 16, 2008

Thick and Thin

Back in the good old days, when you had to carry around a PCMCIA card in order use Wifi, and even before the term "Wifi" was coined, Wireless Access Points (WAPs or just APs) provided all the functionality of 802.11 wireless in a single device. Each AP minded it's own business and did it's own thing - communicating with clients over radio frequencies, encrypting and de-encrypting packets if necessary, and passing those packets onto the wired network. This was all fine and good when Wifi hotspots were - well, just that - "spots" - individual, isolated locations. But as companies and universities started deploying very large areas of contiguous coverage - sometimes with thousands of APs - some issues surfaced with this model.

For example, with individual autonomous APs, someone has to manually tune the power of radio signals on each AP so that, together, the APs cover an entire area. Also, as clients roamed between APs that had encryption enabled, the client needed to establish a new encryption key with each new AP which would take time and cause a short "outage" during the transition.

But, what if there was a central "controller" that controlled all the APs and knew everything all the APs knew ? Then the controller could tell each AP how strong it's radio signal needs to be in order to "fill" an area. And the encryption key could reside on the controller instead of the APs so that as clients roam between APs they don't need to renegotiate an encryption key.

Thus the terms "thick" or "fat" APs and "thin" APs were born ! With these new "controller-based" systems, the functionality of the traditional AP is split across the AP (or in HP speak Radio Ports) and one or more central controllers. This architecture provides a number of advantages in addition to the ones I mentioned and nearly all the enterprise-class wireless systems on the market today utilize this model.

What's this all about ?

Well, we really have a few main goals associated with upgrading the wireless network.

1) Replace the old wireless hardware that is now 6+ years old with a "modern" system that is much more capable. If you stay tuned, I'll cover what a lot of those new capabilities are...

2) Expand coverage area especially in the Bloomington Halls of Residence, but also in other areas of the IUB and IUPUI campuses. Our current wireless deployment in the IUB Halls only covers common areas such as lounges. We will be expanding to cover nearly every square inch of the Halls of Residence (ok, don't quote me on that every square inch thing) - that means coverage in every student room as well as all common areas. To give you an idea of the scale, we currently have a total of about 1,200 Access Points (APs) for the entire Bloomington campus. We will be adding 2,700 new APs *just* in the Halls of Residence ! That's a LOT of wifi !

3) Deploy WPA2 Enterprise to replace VPN as the mechanism for accessing wireless securely. I'll have one or more posts dedicated to discussing what WPA2 Enterprise is and how it works, but in a nut shell it provides an authenticated and encrypted wireless connection in a way that is MUCH more user-friendly than VPN. I've been using this is our test environment for several months and I can tell you, once you use WPA2 Enterprise, you'll never want to use VPN to connect to wireless again !

Let the fun begin !

After MANY months of working on the RFP for IU's next-generation wireless network, we FINALLY made an award this past Monday !! Of course that means now the REAL work starts !

Starting very soon now (soon = about 3-4 weeks from today) we will start deploying a new, improved and much larger wireless network using HP ProCurve's ZL wireless system. I know, I know - you probably have all sorts of questions like "why are we doing this ?" and "what do I get out of it ?". Hang tight ! I'm well over 30 and new to this new fangled blogging thing, but I'll try to get everyone up to speed over the next week or so. So stay tuned and you'll learned everything you wanted to know about wireless and probably a few things you didn't want to know !

If you want the readers digest version, hang on for a few more days and I'll post a link to the video podcast we just shot about an hour ago over at the Wells Library. I think I covered all the topics I had in my outline, so this should give you a reasonably decent overview.