Link Aggregation and ECMP load-balancing

Ostinato Team bio photo By Ostinato Team

Link aggregation is a useful networking method that improves bandwidth, reliability, and performance by merging multiple physical links into one logical link.

The Evolution of Ethernet Bandwidth: From 1G to 800G

Ethernet link bandwidth has increased massively over the last few decades -

10M --> 100M --> 1000M (1G) --> 10G --> 40G/100G --> 200/400G --> 800G
1983    1995     1999           2002    2010         2017         2022

2.5G, 5G, and 25G ethernet were added in 2016, and 50G in 2018.

The years mentioned are the years when these were standardized. Commercial availability, affordability, and wide deployment typically a few years later. 200/400/800G is still not widely deployed yet.

As you can see, in the early days and till 2010, the link bandwidth increased by 10x across a generation. Take the transition from 1G to 10G e.g… If a deployed 1G link was overloaded, it didn’t make sense to replace the link with a 10G link - 10x capacity was not needed and the cost differential between 1G and 10G was high, so this replacement was difficult to justify.

Bundling multiple physical 1G links (e.g. 2x1G or 4x1G) into one logical link is both easier and cheaper - this is called a Link aggregation group (LAG).

The logical link has its own Mac address (or reuses one of the member link’s Mac address as its own) and is assigned the IP address. You cannot assign IP addresses (or VLANs for that matter) to links used within a bundle. When traffic is sent out of a logical bundle, it is load-balanced across multiple physical links.

Link Aggregation Group (LAG)

There’s another important advantage to use LAG over a single physical link - redundancy and resiliency. A failure of a single (or more!) physical link does not bring down the logical link, so traffic can continue to use the remaining links, possibly degraded (if enough bandwidth is not available).

Link aggregation is also referred to by different names - bonding (e.g. the Linux bond interface), Juniper has aggregated ethernet ae-xxx interfaces, Cisco has ether-channel, others call it teaming or even trunking (not to be confused with VLAN trunks).

Homogeneous and Heterogeneous Speeds

Links within a logical bundle could be all of the same speed or of different speeds. Support for heterogeneous speeds in a bundle is typically vendor-dependent.

It’s easier for ASICs in data planes to load balance equally across links of the same speed than to take into account the different speeds of different links for effective load balancing.

Types of LAGs - Static vs LACP

There are several types of Link Aggregation Groups (LAGs). Static LAGs involve manually configuring links at both end points and are suited for simple networks with predictable configurations.

Dynamic LAGs use the Link Aggregation Control Protocol (LACP) to automatically detect and configure links, making them suitable for larger, dynamic networks. Multi-chassis LAGs (MC-LAG) aggregate links across multiple devices to ensure high availability, commonly used in data centers.

Using LACP has some advantages over static e.g. sometimes link failures are detected only by one end and not the other end - in a static LAG, the latter would continue sending traffic which the former would not receive. Similarly, a LAG-related misconfiguration of physical links on one side would be detected by LACP, but not a static LAG.

What is ECMP (Equal cost multi-path)

While LAG is below Layer 2 and limited to directly connected neighbors, you have the concept of Equal cost multi-path (ECMP) at Layer 3.

At L3, you may have more than one path to reach a destination. To avoid overloading a single path, ECMP allows you to load share between all the equal paths.

ECMP load balancing

ECMP by definition is similar to a LAG with homogenous speed links. You can also have unequal cost load-balancing e.g. weighted ECMP.

Hash to load balance

Each vendor has its own proprietary algorithm to load balance traffic. But all of them are typically based on the concept of hashing.

A set of packet fields are chosen e.g. the popular 5-tuple: (Source IP, Destination IP, IP Protocol, TCP/UDP Source Port, TCP/UDP Destination Port). For each packet being forwarded, the device would extract the values of these fields, combine them, and generate a hash value for them. This value is between 0 and n-1 where n is the number of links over which it is supposed to load balance the traffic. This result value is the chosen link or path for the packet.

Hash packet fields to get outgoing member link

The choice of packet fields for hashing is important because packets with the same values for those fields constitute a flow and it is important to send all packets of the flow on the same link or path. Defining a flow too broadly or too narrowly could lead to problems.

We will discuss load-balancing related problems in a future post.

Verifying load-balancing

Unlike routing protocol configuration, where you can verify the results using CLI commands to view the RIB and FIB, to verify load-balancing you need to send actual data plane traffic and observe the traffic on the LAG links or ECMP paths to verify how the load balancing algorithm is behaving.

To observe load-balancing you need to generate multiple flows varying the packet fields used in the hash. Ostinato’s variable fields feature is very useful to do just that.

Ostinato variable fields can generate multiple flows

The generated hash for all the flows needs to have sufficient entropy for (almost) equal load balancing across all links or paths. This is a function of the hashing algorithm implemented by the vendor. You can increase the number of flows to increase the entropy and likelihood of better load balancing.