Not all parts of an SP’s network are the same. From the planning as well as operational perspective, the access network is very different from the core. The core, on the other hand, is nothing like a data centre network (DCN), and so on. At heart, what makes them different is the traffic pattern. In the access network, traffic needs aggregation—from subscribers scattered over a wide geographic area. Similarly, the Internet backbone needs high capacity and complete control over traffic leaving and entering the network.
What about DCN? There was a time, not long ago, the traffic pattern in a data centre was very similar to that of the Internet backbone. Most traffic moved in and out of DC -so-call north-south flow. A three-tier DCN architecture with high uplink capacity was sufficient to handle such flow, with firewalls at the distribution layer implementing the flow control.
As cloud technology evolved, the dominant pattern shifted to so-call east-west flow; in this case, most of the traffic goes back and forth between the servers within the same DC. DCN architecture went through a major revision; eventually, cloud and telecom industry adopted Clos-topology. The Clos-topology is better known as Leaf-Spine architecture. A Leaf-Spine DCN provides non-blocking switching fabric for east-west flow. Despite its non-blocking nature, like any IP network, a Leaf-Spine DCN suffers from three significant problems.
Elephant flow -all modern routers/switches support either 3-tuple (aka L3) or 7-tuple (aka L4) hashing to load-balance flows across ECMP (Equal Cost Multi-Path). Such hashing-functions consider all flow as equal. Statistically, a DCN fabric suffers imbalance across ECMP due to few long-lived flows, so-called elephant-flows, carrying substantial traffic. In short, flow-based load-balancing is not suitable for DCN. Per-packet load-balancing, on the other hand, likely to cause out-of-order packets on arrival at the application. The optimal solution is to break flow into small Flowlets -short-bursts of a flow, which then load-balanced across ECMP; this guarantees load-balancing between Flowlets and in-order packet delivery within a Flowlet. Now the challenge-1, how to achieve Flowlet switching?
Non-Oblivious routing -all modern routing/switching platform has software-programmable RIB (Routing Information Base, the control-plane) and hardware-programable FIB (Forwarding Information Base, the data-plane). Usually, the routing-protocols discover the topology and provide the forwarding information to the RIB, which in turn programs the FIB. Non-matter which routing protocol is used, RIB has no clue (i.e. oblivious) about the ECMP status of the next-hop node. As a result, a node may locally load-balanced across ECMP, while one of the downstream neighbours lost members of its ECMP, hence creating a hotspot within DCN fabric. The trouble is, how to implement non-oblivious (or omnipotent?) routing on DCN -the challenge-2.
Micro-loop -is an unavoidable aspect of any IP network. It happens because each node in IP networks possesses autonomous control-plane; as a result, they update FIB independently. In case of link/node failure, some node will update their FIB faster than others, causing a loop for several hundred milliseconds (hence the name Micro-loop). Micro-loops can occur at any locations during the convergence of the network. The FRR (Fast ReRoute) and LFA (Loop-Free Alternate) guarantee 50-ms traffic restoration. But due to their topology dependent nature, they cannot mitigate all micro-loop events. The challenge-3 thus -how to confine micro-loop within 50-ms restoration threshold?
The idea of SDN is to use OpenFlow to program flow-state directly into FIB (from a centralized controller). Due to asymmetric scaling factor between control and data plane, SDN requires RIB and FIB separation -RIB on controller and FIB on the network node. Instead of routing protocol to configure RIB, SDN allows applications to program RIB through northbound API. One of the natural outcomes of SDN is the CUPS -the control and user-plane separate. But, CUPS doesn’t necessarily mean SDN, for example – the 3GPP EPC and 5GC support CUPS, but these networks are not SDN.
The biggest problem with classical SDN is that –OpenFlow makes the network stateful. All modern routers are stateless; they forward traffic based on incoming label (MPLS) or destination IP address. Such forwarding requires two FIB lookups; the first lookup happens on ingress line-card to determine the egress line card. The other FIB lookup resolves L2-adjacency and do header-rewrite. In an OpenFlow based SDN, FIB lookup matches 3-Tuple or 7-Tuple; this requires bidirectional flow-state programming into FIB across the network. A network node may look simple as the control-plane removed; however, data-plane becomes exponentially complicated as it needs to maintain flow-states for every application in the DCN. Moreover, a classical SDN is not up to the challenges mentioned above.
In the question of SDN, the network philosophy thus becomes –don’t program the flow (in the network), program the packets. As a consequence, a genuine alternative of OpenFlow -SR (Segment Routing) was born.
SR is more than just an SDN protocol. In control-plane it uses IGP/BGP protocol extension for hop-by-hop label distribution. So, it eliminates LDP. SR can use either MPLS or IPv6 data-plane, so no change needed in the FIB. SR with IPv6 data-plane is called SRv6.
SR-TE supports topology independent LFA (TI LFA); it means -SR always chooses the post-convergence shortest path for LFA. Hence, despite the change of network topology after a failure, the traffic restoration path remains the same. TI-LFA can provide traffic restoration within 50-ms, including micro-loop. SR solved challenge-3, which was not possible in OpenFlow.
In SR, each node is a segment with a global-label (prefix/node SID). The source pushes label-stack in the correct order (label imposition) into the packet (programming the packet). Each label in the stack doesn’t need to be next-hop nodes; so, flows are load-balanced across ECMP within each segment. Multiple nodes can share an anycast SID; this allows load-balancing between segments. In case of link/node failure, an omnipotent controller can push right label-stack to steer traffic away from the hotspot. In the same way, by combining anycast and prefix SID, elephant-flows can be handled efficiently.
A large scale SR interconnection can support various SDN use-case extending different parts of an SP network. For DCN use-case, an SR NNI can attach 300-million endpoints between two leaf domains. It’s also a better alternative to Seamless MPLS often promoted between core and access network. Importantly, with identical data-plane, SR as source routing technology can co-exist with any destination routing and label switching paradigm.
The picture at the top is from the book “Segment Routing” [Part I] by Clarence Filsfils et al. In my opinion, its the most authoritative book [including part 2] on SR till today.