Wednesday, October 26, 2011

Networking & DevOps: Getting Back to Our Roots !

I spent last week at the Open Networking Summit and Open Networking Foundation Member Workday and came home with a head full of ideas to write about.   Here's my first set....

At ONS I heard the term DevOps mentioned many times.  In fact, Stacy  from GigaOM (who btw was an excellent moderator for the panel I participated in) wrote a post about it as well (See "Does networking need a devops movement").  Perhaps it's because I live in the corn fields of Indiana or because I really live in my own little world most of the time, but I actually had to Wikipedia the term DevOps (Can you use Wikipedia as a verb ?).   In any case, when I read the Wikipedia article I thought, DUH OF COURSE !!  

You see, this is how we've been developing network management software at the GlobalNOC at Indiana University for the past 12 years.  For those of you who don't know us, the GlobalNOC partners with universities and non-profits to help them manage their networks by providing NOC and network engineering support.  Very early on, as we grew from working with 1 network to 3 or 4 networks,  it became painfully obvious that we would need very good, highly customized set of software to help us manage these very diverse networks.

This started out with a few network engineers (like myself) who had CS backgrounds hacking together some open-source software with way too much homegrown Perl scripting for our own good.  We wrote the software, we supported the software and we used the software to manage networks !  Over time, as the team grew and the software became much more sophisticated, it was necessary to split the developer/sysadmin team from the network engineering team, but they are still very closely linked.

I'm no longer directly involved with the developer/sysadmin team (SYSENG as we call them), but the parts of the development process described in the DevOps article on Wikipedia, such as reduced release scope with smaller changes more often and automation of deployment, sounds very familiar :)  I'm told we rolled out something like 40 software releases into production last year.  In addition, the "users" of the software, ie the network operators and engineers, have a very close link to the developers/sysadmins making it easy to exchange ideas for new features and to track down bugs.

One of the big reasons I've been excited about SDN is the possibility that it could be used to apply these same DevOps principles to the development of control-plane software for networks.  If I'm not mistaken, this is how IP networking started in the first place (or so I'm told) !  I'm certainly not old enough to have been around during the early days of the Internet.  However, I was very fortunate to start my career with a company called ANS, which operated the NSFnet, and had the privilege of working with many of the people who were.

From what I've heard, the early IP networks were run by computer scientists who both operated the network and wrote the software that powered them.  The process of getting new features into the software didn't involve hundreds (or thousands) of network engineers submitting ideas for new features to sales reps, product managers trying to distill these requests into the actual features that will go into the software, developers building the software, QA teams testing the software, etc - only to have the QA teams at the customer sites do their own extensive QA tests to verify the product works properly in their unique environments - before the feature could be put into production.

Now, I don't know for sure, but I suspect the process "in the good ole days" led to a few more bumps in the road then would be acceptable in today's production networks.  However, I think SDN at least (re)opens the possibility of a tighter "loop" between the people who manage networks and the people who build the software the powers them and IMO that would be a good thing !  

At the Open Networking Summit last week, the Indiana University team demo'd our first example of SDN software developed using this methodology and it's already deployed on a nationwide, production backbone network called NDDI.  We'll be demo'ing our next piece of SDN software at GEC12 next week which solves a specific issue on campus LAN.  Our plan is to have it running in production in the Indiana University network by the end of November.  From the internal demo session this morning, it looks like were very much on track to make that happen.

In the GigaOM article I referenced earlier, the question was posed as to "where in the networking world these developers would come from".   Well, Stacy, we're growing them today, right here in the corn fields of Indiana !! 

  

Wednesday, March 16, 2011

GEC10 Demo Night

After multiple cab rides and a scenic tour of several areas of San Juan I wouldn't dare go after dark, I made it to the Sheraton Conference Center (not to be confused with the Sheraton Old Town)  in time to setup for the Tuesday night demos.

IU had 3 demos last night:  GENI Meta Operations Center (GMOC), KetKarma and Openflow Campus Trials.  I was responsible for the Openflow Campus Trials Demo and, as luck would have it, nearly all of the monitoring work Chris Small has done is now rolled into the GMOC tools and was on display in the GMOC demo.  Also, John Meylor and Camillo Viecco were on hand and fully prepared to answer questions about Measurement Manager, so I was able to spend quite a bit of time checking out the other demos.  

Ericsson had a really compelling demo of their  MPLS implementation for Openflow that utilizes NOX and Quagga.  UQAM unveiled an EZ-Chip network processor based 100G switch that fully implements the Openflow 1.1 spec.  As far as I know, this is the first complete implementation of Openflow 1.1 and at 100Gbps no less !!   Sounds like they still have some more work to do before this is ready for use, but it looks like they could have a really compelling platform.


I had a number of really good discussions about Openflow that continued through dinner and a drink in Old San Juan.  Now on to the real work of the conference on Wednesday !

Monday, March 14, 2011

Openflow and Vendor Lock-In

Openflow is really just one piece in a much broader architecture known as Software-Defined Networking (SDN).  The concept of SDN is actually very simple and is explained quite clearly in a number of presentations by Nick McKeown from Stanford - amongst others.  See http://tinyurl.com/4ekzb9w

The basic idea is that today's networking products mirror the mainframe computer industry of decades past.  Vendors deliver packages of proprietary hardware, operating systems and applications bundled together.  Unlike the PC industry of today, you cannot choose to buy network hardware from one vendor, an operating system from another vendor and applications from a third vendor.   In fact, in many cases it becomes difficult to distinguish a product's hardware from it's operating system from it's applications.

So the idea of SDN is to create open interfaces between these layers so the networking market resembles the PC market with competition at each layer of the stack.  Openflow is the open interface between hardware and the network operating system - perhaps roughly analogous to x86.  As network operating systems develop, as several currently are, there will hopefully be open APIs that application developers can use to build functionality on top of the operating systems.  In theory, this should be good for the consumers of networking products because they should have more options and more competition should lead to reduced costs.  As a consumer of networking products, I'm all in !   But will theory and reality actually match up or will something get lost in between ?

I'm a firm believer in learning from history, so let's take a look for a minute at what happened in the world of wireless controllers to see if we can glean any useful knowledge from that experience.

Of course there was CAPWAP which, in some ways, is analogous to Openflow in that it was meant to be an open interface between controllers and APs.  As I understand it, if CAPWAP had achieved success as a multi-vendor interoperable standard, you could have had a controller from vendor A and APs from vendors A, B, and C and they would have played nicely together.  Of course this didn't happen and consumers have to purchase APs and controllers from the same vendor.

However, the lack of controller-to-AP interoperability is not the most troubling interoperability problem associated with controller-based wireless systems.  What happens when I have 30 controllers and 4,000+ APs deployed on a large university campus and then decide to switch wireless vendors ?   Sure, the fact that there's not competition at both the AP and controller layer means I'm probably paying more than I otherwise might.  But, technically,  it's not really a problem deploying the new controllers and having the new APs associate with the new controllers.   That's easy enough.

The real problem is that the controllers from vendor A and vendor B do not interoperate in any meaningful way.  Sure, because they both do basic IP forwarding, a wireless client on one system can communicate with a client on the other system.  But none of the features (ie applications) that drove me to choose a controller-based system over stand-alone APs in the first place are supported across both systems.  Features like RF management and layer-3 roaming do not work across wireless systems.  So, if during the transition between vendors, two adjacent buildings are on different systems, users can't roam seamlessly between buildings, there can be RF interference issues, and captive portal logins aren't maintained forcing users to re-authenticate simply because they moved from one building to another.  As a result, many organizations have become locked-in to their current controller-based wireless vendor.  The amount of effort required to switch vendors is high enough, that they're willing to stay with their current vendor unless they're REALLY unhappy with them or can save a TON of money by switching.  Anecdotally, I hear lots of people complaining about their wireless systems and almost none of them are considering changing vendors !!

So what does this have to do with Software-Defined Networking and Openflow ?   Well, there are in fact a lot of similarities.  Openflow allows a single, software-based network operating system to control potentially hundreds or thousands of hardware devices - not unlike a wireless-controller.   Novel new applications can be rapidly developed on top of the network operating system to allow new functionality and more efficient management of networks - not unlike wireless controllers.  But, if the applications are tightly coupled to the network operating system (as is the case with wireless controllers) and if customers do not sufficiently compel the software vendors to make their products interoperate at an "application" level, consumers could be left in the same vendor lock-in situation they're currently experiencing with controller-based wireless systems.

At this point, those of you who know me are probably thinking, "Man, when did Matt get so down on Openflow ?".   Certainly it's not all gloom and doom.  The SDN product space is developing much differently than the controller-based wireless market.  Open-source projects are flourishing which will hopefully help drive the market is an open-standards direction.  But we, the consumers of networking equipment, need to be vigilant.  Don't just assume that creating competition at both the hardware and operating system layer is going to be good for the consumer.  What happens at the layers above that - the operating system and application layers - is probably much more important in the long run !!

Monday, January 31, 2011

Openflow @ Internet2 Joint Techs in Clemson

Indiana University hosted a BoF session on Openflow this afternoon at the Internet2 Joint Techs Workshop in Clemson, SC.   Chris Small did an excellent job organizing the session and pulled off a great demo of VM mobility.  I presented the intro slides and did a short demo of an Openflow controller from Big Switch Networks.

We counted 60+ people in the room they scheduled for us.  It was packed and there were people standing in the hall.  I asked for a show of hands of people who had never heard of Openflow (0 hands) and of people who knew very little or nothing about Openflow (2-3 hands).  60 people, primarily campus network engineers, in the room and nearly all of them knew something about Openflow.  I was completely blown away !  They ended up moving us to the auditorium so more people could join.  I counted 88 people once we were settled in the auditorium !

Overall it was a good session with some excellent side discussions afterwards.  Next up is GEC10 in Puerto Rico !

Thursday, August 19, 2010

Proposal Preparation !

That pretty much sums up the last 4 weeks of my life.   GENI Solicitation 3 proposals are due by 5pm tomorrow.  The IU GlobalNOC is leading or partnering on several different proposals for various parts of the solicitation.  I'm personally working on 2 proposals, PI on one and Co-PI on the other.

I'm looking forward to getting back to "normal" after tomorrow and there's no shortage of other work to be done.  We're evaluating our options for 2011 router refreshes for both IU and I-Light.  Our pilot of the Summer of Networking internship program went extremely well and we're already preparing to continue, improve and hopefully expand the program for next year.  We're also working on a plan to expand the hands-on network training opportunities for networking staff both at IU and other universities.  GEC9 (9th GENI Engineering Conference) will be here before you know it and I suspect the preparations will kick into high gear after proposals are submitted tomorrow !

Tuesday, July 13, 2010

Openflow @ Internet2 Joint Techs

Internet2 is holding their summer Joint Techs Workshop at Ohio State this week and Openflow was featured prominently on yesterday's afternoon's agenda.  At 3:00 Srini Seeththaraman from Stanford gave an excellent overview of Openflow.  I followed that up with a talk at 4:30 that was focused on the practical aspects of  potential Openflow applications in R&E network and what network engineers can do to get started.  That was immediately followed by a presentation from Heidi Picher Dempsey from the GENI Project Office who talked about GENI and Openflow's application within the GENI infrastructure.  GENI and Openflow were also the primary topic among the regional networks at the Gigapop Geeks BoF with both Heidi and I leading discussions.  There were many good questions and excellent discussion about Openflow and GENI.

The slides and archived video from all of the presentations is available on the Internet2 Joint Techs Workshop agenda page:  


http://tinyurl.com/24rzjn5    

Thursday, June 17, 2010

Openflow Wish List

There are very smart people involved in the development of Openflow.  However, I suspect very few of them actively manage networks on a day-to-day basis.  Now that the code is in the hands of network engineers, we can see what's needed to actually get this running in production networks.

When it comes to emerging technologies, this space  between the development and actual production use - between developers and the network engineers in the trench -  is something I find incredibly interesting.  It's great to be involved in the development at the point you can provide substantive feedback into the actual product or technology.

And that is where we are today with Openflow.  We have Openflow deployed to 4 "production" switches and have a wireless SSID in 3 buildings across campus that feeds into an Openflow switch.  The cool thing is that it all pretty much works.  The problem is that, when it doesn't work, it's a pretty big pain to figure out why.  Yesterday I compared it to the early days of the GSRs when the tables on one of the linecards would get out of sync, but it's a bit worse because the "linecards" are spread across the whole campus and there are very few debugging tools available.

There are a number of debugging features that would be useful, but I think the most useful one would be a way to see the dataplane and control-plane packets at the same time.  One way to do this would be for the switch vendors to allow you to add Openflow control-plane packets into a port-mirroring configuration.  This would allow me to hook a sniffer up to a switch port and mirror both the traffic to/from a switch port and the Openflow control messages to the sniffer.  

Why would this be useful ?  One problem we're having right now is that some laptops take 1-2 minutes to get a DHCP lease on the Openflow network.  Is the switch taking a long time to encapsulate the first DHCP message into an Openflow message and send it to the controller ?  Is the controller taking a long time to send the packet-out and flow-add messages to the switch ?  Are the Openflow messages getting lost along the way ?  Today I have to run a tcpdump on the Openflow controller to capture control-plane packets and Wireshark on a laptop to capture the dataplane packets and then try to compare them without synchronized timestamps.   This one little feature would have saved us a lot of headaches !