The TCP family has failed to achieve consistent high performance in face of the complex production networks: even special TCP variants are in many cases 10x away from optimal performance. We argue this is due to a fundamental architectural deficiency in TCP: hardwiring packet-level event to control responses without understanding the real performance result of its action.
Performance-oriented Congestion Control (PCC) is a new architecture that achieves consistent high performance even under challenging conditions. PCC senders continuously observe the connection between their actions and empirically experienced performance, enabling them to consistently adopt actions that result in high performance.
To appear in USENIX NSDI 2015
Mo Dong*, Qingxi Li*, Doron Zarchy**, P. Brighten Godfrey*, Michael Schapira**
*University of Illinois at Urbana Champaign
**Hebrew University of Jeruselem
November 6, 2014 at MIT Wireless Seminar
This work is supported by the National Science Foundation and won a 2013 Internet2 Innovative Application Award (under the name BBCC)
Compare PCC and TCP in Action
The PCC architecture requires only sender-side changes to TCP’s rate control algorithm (with no changes to the host-to-host communication protocol), but this prototype is implemented in userspace on top of UDP.
Goto Github Repo
Why TCP's Architecture is Broken
Though using different control algorithms, most TCP's variants share the same hardwired mapping rate control architecture: hardwiring certain predefined packet-level events to certain predefined control responses.
The design rationale behind the hardwired mapping architecture is to make assumptions about the packet-level events. When it sees a packet-level event (e.g. packet loss), TCP assumes the network is in a certain state (e.g. congestion) and tries to optimize performance by triggering a predefined control behavior (e.g. halving window size) as the response to that assumed state. But real networks are complex, with bandwidth varying by 10,000x, virtualization, AQMs, middleboxes and software routers, shallow queues and bufferbloat, congestion loss and random loss. TCP's assumptions fail but it still mechanically carries out the mismatched control response, resulting in severely degraded performance and very unfortunately, without seeing its control action’s actual harm on performance.
How PCC Works
PCC's goal is to understand what rate control actions actually improve performance based on live experimental evidence, avoiding TCP's assumptions about the network.
PCC sends at a rate r for a short period of time, and observes the results (e.g. SACKs indicating delivery, loss, and latency of each packet). Instead of triggering control responses immediately, it aggregates these packet-level events into a utility function that describes an objective like "high throughput and low loss rate". The result is a single numerical performance utility u.
At this point, PCC has run a single "experiment" that showed sending at rate r produced utility u. To make a rate control decision, PCC needs to run multiple "experiments": it tries sending at two different rates, and moves in the direction that empirically results in greater performance.
The above illustrates the basic idea of control decision making in PCC. More generally, PCC runs an online learning algorithm to track the best sending rate as it streams application data. It uses a randomized controlled trial mechanism to help react quickly yet effectively even in a dynamic network.
Thus, by avoiding assumptions about the underlying potentially-complex network, PCC tracks the empirically optimal sending rate and achieves consistent high performance.
High Performance Out of the Box
Big Data Delivery Over the Internet
Based on our large-scale experiments over the global commercial Internet, PCC can beat TCP CUBIC (the Linux kernel default) by more than 10X on 44% of the tested sending-receiving pairs.
We use several representative samples to compare how long it takes to deliver 100GB of data when using PCC, TCP or just taking a flight and carrying the data with you.
Huge Data Delivery over Dedicated Network
Have huge data (10TB/delivery)? Own a fast network? Can provision dedicated network capacity? Don’t waste your capacity! To test this scenario, we provisioned multiple fully dedicated 800Mbps links across the GENI Internet2 backbone. Here is the time to deliver 10TB data in PCC and TCP Illinois.
On an emulated WINDS satellite Internet connection based on real-world measurement, PCC delivers data 17X faster than TCP Hybla.
Rapidly Changing Networks
We tested TCP and PCC on a network path where available bandwidth, loss rate and RTT are all changing every 5 seconds, with bandwidth ranging from 10Mbps to 100Mbps, latency from 10ms to 100ms and loss rate from 0% to 1%. Compared to TCP variants, PCC tracks the optimal rate closely.
More in the paper
Our paper details further results showing PCC delivers high performance across a range of challenging conditions. In addition to those above, we test:
- Unreliable lossy links (10-37X vs TCP Illinois)
- Unequal RTT of competing senders (an architectural cure to RTT unfairness)
- Shallow buffered bottleneck links (up to 45X higher performance, or 13X less buffer to reach 90% throughput)
- TCP Incast in data centers, where PCC performs similarly to the purpose-built ICTCP.
Stable Convergence and Fairness
PCC’s control algorithm is selfish in nature. Surprisingly, it can achieve fairness and much more stable convergence than TCP. The following is a typical dumbbell topology convergence experiment, 100Mbps, 30ms latency bottleneck. Four flows sequentially arrive with 500s interval. Each flow sends for 2000s.
PCC converges to fairness point stably.
TCP shows very unstable behavior.
Flexible Performance Objectives
PCC has a feature outside the scope of the TCP family: PCC can directly express different data delivery objectives by simply plugging in different utility functions. Consider an interactive application that wants to get high throughput and low latency, i.e. with the goal of optimizing throughput/latency, defined in previous literature as "power".
When using TCP to deliver data, users of this interactive application will be very upset by the lagging user experience, because TCP does not know the low latency objective, still aggressively fills the network buffer and causes annoying delay known as bufferbloat. To get better latency, either in-network active queue management mechanisms like Codel or a fork lift change of the protocol stack is needed. When using PCC, one can simply plug in the data delivery objective throughput/latency as utility function. Then PCC's control learning algorithm will control the sending rate to empirically optimize this utility function.
As shown in the figure below, PCC caters to interactive application's objective dramatically better than TCP and even achieves more power than TCP with addiontal in-network AQM.