Monday, April 9, 2012

Software Defined Footworking?

I have a bum ankle, arthritic from an old injury, and it frequently causes me a lot of pain.  As I limped along on a particularly painful day I thought about the pain messages sent by my nervous system from my ankle to my brain.  Whether you believe in grand design or evolution, the fact is that the winning architecture for human biology was to remote the intelligence that synthesizes messages from our senses to the brain, one hop away from the nerve sensors that detect pain. This might appear to be unnecessarily complicated and thus subject to a higher failure rate than a design with an autonomous ankle, locally processing the pain sensed. However, the brain makes much better decisions by virtue of having a global view of all the inputs and the state of all the sensors in the body.  For example, when walking over sloped terrain my ankle will start to hurt. An autonomous ankle making a local decision would likely lead to a good result. In response to the “walking on this surface hurts” input, the local decision would be made to stop sending the “instruct ankle muscles to make walking motion” messages, bringing me to a halt, and eliminating the source of the pain.  However in a situation where I am about to be killed by a charging rhinoceros, this otherwise reasonable decision would clearly lead directly to severe injury or death.  Our existing design, based on a central controller (brain) analyzing myriad inputs from disparate sensors thankfully makes the “I don’t care how much it hurts, run like hell” decision and instructs the necessary muscles to push me into road-runner mode.  The state of my visual (and likely my auditory) sensors is factored into the decision resulting in a far more optimized decision.

IP Networks were designed based on an autonomous systems model to provide resiliency in the face of nuclear attacks during the cold war, a decision that traded off many of the benefits of global state knowledge for this resiliency. In this sense the biological analog to an IP router might be something like an earthworm. Pull off the head (if you can figure out which end that is) and it will continue to wiggle and crawl using the autonomous design of its nervous system and system of locomotion.  No brain required. This is not the case for a human, as disconnecting the central controller from the muscle elements represents a fatal single point of failure (SPOF). In the worst case the earthworm has a more resilient design, but you won’t find earthworms launching moon shots or developing cures for cancer. Last time I checked, we humans sat a tad higher in the food chain than our slimy, stupid friends. Cockroaches also are expected to survive nuclear holocaust, but they give up a lot of intelligence to gain this benefit.

Software defined networking, as characterized by a split control plane (common controller with full visibility to state of all network elements), has the potential to improve forwarding decisions and responses to faults in a manner that might be as superior to the behavior in an AS modeled network as the human nervous system is compared to the earthworm’s.  The point is not that we should run the existing control plane technology remoted across an unnecessary SPOF, but rather that global visibility allows us to design a much simpler control plane (lower complexity always equals lower failure incidence) that can utilize the available parts of the network better.  The problems presented by potential controller failures or a link failure between the controller and the network elements can be addressed with the various techniques we have already developed for addressing this solved problem in network systems. Specifically,  dual redundant controllers and network links can protect against any “one deep” failures, and cluster architecture within a given controller can increase the availability of each controller instance. Furthermore, caching of local forwarding state and rules in the network elements can provide “non-stop forwarding” until controller connectivity is re-established. Given the virtually unlimited ability to provide compute resources in an external server cluster, one could precompute the new forwarding logic for a variety of fault scenarios and survive two deep failures, which is a big improvement on non-stop forwarding using a static snapshot of the state at the time of the first failure.

Unfortunately this notion is only an analogy for purposes of discussion and the human central nervous system is not nearly as programmable using our existing state of knowledge as, say, the NOX controller. If I could, I would filter those pain-from-the-ankle packets and have a lot less suffering when I run, jump and play.