SP-PIFO is the first programmable packet scheduler which closely approximates the behavior of PIFO queues using strict-priority queues—at line rate, at scale, and on existing devices.
The key insight behind SP-PIFO is to dynamically adapt the mapping between packet ranks and available strict-priority queues, according to the observed network conditions, to minimize the scheduling errors with respect to an ideal PIFO queue.
Today's switches still rely on a fixed set of scheduling algorithms, implemented in hardware, from which operators can select and, at most, configure some parameters. For a new scheduling algorithm to be adopted, operators need to wait until a new fabric is released. With programmable scheduling, instead, operators can directly test and deploy new algorithms on the existing fabrics, reducing time-to-market and fostering innovation.
Programmable scheduling also improves performance. Indeed, recent research has proved that it does not exist a universal packet scheduler capable of outperforming in all possible scenarios. Instead, the best scheduling algorithm to select depends on the desired performance objective. For example, pFabric minimizes flow completion times, FQ enforces fairness and LSTF minimizes tail packet delays. Programmable packet scheduling facilitates the implementation of these schedulers, not supported nowadays.
The main challenge to enable programmable packet scheduling is to find an appropriate abstraction which is flexible enough to express a wide variety of scheduling algorithms and yet can be implemented efficiently in hardware. Push-In First-Out (PIFO) queues have been recently proposed as such an abstraction. PIFO queues allow packets to be pushed into arbitrary positions in the queue, according to their assigned ranks (or priorities), while being drained from the head.
PIFO queues enable packet-scheduling programmability: one can program how packets should be sorted in the PIFO queue in order to achieve a desired performance objective. However, implementing PIFO queues in hardware is difficult due to the need to arbitrarily sort packets at line rate. Up to now, only hardware designs (not implementations) have been proposed, and they can only support about 1000 flows.
SP-PIFO closely approximates PIFO behaviors on top of widely-available Strict-Priority (SP) queues. The key insight behind SP-PIFO is to dynamically adapt the mapping between packet ranks and SP queues in order to minimize the amount of scheduling mistakes relative to a hypothetical ideal PIFO implementation.
Mapping: SP-PIFO maps each incoming packet to the available strict-priority queues according to the queue bounds. Queue bounds identify, for each queue, the smallest packet rank that can be enqueued. Whenever a packet is received, SP-PIFO scans the queue bounds bottom-up, starting from the lowest-priority queue, and enqueues the packet in the first queue with a bound smaller or equal to the packet rank.
Adaptation: SP-PIFO dynamically adapts queue bounds such that the resulting scheduling closely approximates an ideal PIFO queue. The adaptation consists of two stages: a push-up stage where future low-rank (i.e., high-priority) packets are pushed to higher-priority queues; and a push-down stage where future high-rank (i.e., low-priority) packets are pushed down to lower queues.
Stage 1: Push-up Whenever SP-PIFO enqueues a packet, it updates the corresponding queue bound to the rank of the enqueued packet. Doing so, SP-PIFO aims at ensuring that future lower-ranked packets will not be enqueued in the same queue, but in a more preferred one. Intuitively, SP-PIFO “pushes up” packets with low ranks to the highest-priority queues, where they will be drained first. Of course, as the number of queues is finite this is not always possible leading to inversions.
Stage 2: Push-down Whenever SP-PIFO detects an inversion in the highest-priority queue (i.e., the packet rank is smaller than the highest-priority queue bound), it decreases the queue bound of all queues. Doing so, SP-PIFO ensures that future higher-rank packets will be enqueued in lower-priority queues. Intuitively, after an inversion, SP-PIFO “pushes down” packets with high ranks to the lower-priority queues in order to prevent them from causing inversions in the highest-priority queue. SP-PIFO decreases the queue bounds according to the magnitude of the inversion (i.e. the difference between the packet rank and the corresponding queue bound): the bigger the inversion, the more ranks are pushed down.
Our hardware-based implementation of SP-PIFO on the Barefoot Tofino Wedge 100BF-32X platform shows that SP-PIFO runs at line rate on existing programmable hardware, efficiently scheduling traffic at potentially Tbps.
SP-PIFO rapidly adapts to varying rank distributions, optimizing the mapping between packet ranks and available strict-priority queues at per-packet level.
This per-packet adaptation makes SP-PIFO more practical than alternative gradient-based techniques, which try to optimize the mapping over periodic time windows.
What do we mean by PIFO? It depends
It is important to note the difference between two concepts: PIFO 'the (theoretical) queue structure', and PIFO 'the abstraction'. The first refers to a data structure capable of perfectly sorting packets by letting them be pushed into arbitrary locations while only dequeuing from the head. The latest refers to the abstraction which lets scheduling algorithms be represented in two steps: (i) a ranking algorithm deciding how to best sort packets in order to achieve a certain performance objective, and (ii) a theoretical PIFO queue implementing such ranking policy. Since implementing theoretical PIFO queue in hardware to be executed at line rate is hard, SP-PIFO aims to approximate its behavior by using strict-priority queues.
How does SP-PIFO enable programmability? By adapting to any given rank distribution without requiring traffic knowledge
SP-PIFO already manages to approximate the behavior of PIFO theoretical queues in hardware (and can be executed at line rate). Therefore, the only remaining challenge is to decide how to best sort packets (i.e., the best ranking policy) in order to achieve a certain performance objective. Scheduling programmability consists on that: defining these ranking policies that SP-PIFO will then execute. As SP-PIFO works for any set of ranks without traffic knowledge required in advance, any strategy that could potentially be implemented on a PIFO queue will be then able to be approximated by SP-PIFO.
Is it possible to implement new scheduling algorithms on top of SP-PIFO? Yes
The only inputs required by SP-PIFO are packet ranks, which need to be computed (or extracted from packet headers) at enqueue. Data-plane programmability (e.g., the P4 programming language) can be leveraged to define new logics on how to compute ranks for incoming packets. For instance, one such strategy could be setting packet ranks according to their flow sizes. Another strategy could be to increase ranks within each flow monotonically, so that they can be served on a round-robin basis. Anyone can design new ranking policies, which SP-PIFO will then approximate.
Does SP-PIFO require any modification at the end host? No
SP-PIFO runs autonomously in network switches, and no modification is explicitly required at the end host. Note that here we do not consider the implementation of particular ranking policies. For instance, one might decide to compute packet ranks at the end-host instead of doing it at the switch. In that particular case, the end host would require the corresponding modifications to support this rank computation as well as the according logic letting the rank be later extracted from the SP-PIFO switch.
Is SP-PIFO the first proposal using strict-priority queues to approximate more complex scheduling algorithms? No
A number of previous work already anticipated that strict-priority queues could be leveraged to approximate more complex scheduling algorithms. For instance, pFabric and PIAS showed the benefits of using strict-priority queues to minimize flow completion times, and FDPA and AFQ showed the benefits of using them to enforce fairness across flows. However, all those proposals only focused on a single performance objective, instead of approximating the PIFO behavior itself. Therefore, they propose fixed mappings of packet ranks to strict-priority queues which are dependent on the ranking policy and require some extent of traffic knowledge in advance.
Can any scheduling algorithm be implemented on top of SP-PIFO? No
As a PIFO approximation, SP-PIFO inherits the limitations of theoretical PIFO queues. First, PIFO queues (and therefore SP-PIFO) cannot rate-limit the egress throughput. Therefore, non-work-conserving scheduling algorithms can not be implemented. Second, individual PIFO queues cannot directly implement hierarchical scheduling algorithms. Yet, multiple PIFO queues can be combined on a tree structure to approximate hierarchical scheduling algorithms. The same strategy could (potentially) be applied to SP-PIFO, although the impact of this approach on performance is still an open question.
Can SP-PIFO adapt to any kind of rank distribution? Yes, under reasonable assumptions
Throughout the paper, we analyze SP-PIFO theoretically and show why it can adapt to arbitrary rank distributions. However, there are two assumptions implicit in this statement. First, we assume that all queues are drained at some point. While reasonable under non-adversarial scenarios, it is true that a malicious host could send a large number of high-priority packets and, as a result, packets in lower-priority queues would never be drained. Such “starvation” attacks are common to any type of priority scheme and are usually addressed by policing high-priority traffic at the edge of the network. Second, we also asume that, for a given rank distribution, the particular order of ranks is random. Also, one could think of adversarial scenarios where attackers coordinate flows to break this randomness. It is interesting to see that, a single non-malicious flow would be enough to thwart such strategies by randomly breaking the adversarial order. Besides that, one could further monitor the network to detect and prevent such type of adversarial attacks.
Does the current SP-PIFO design leave room for improvement? Yes
Some of the open questions that we discuss in the paper let us think that it is indeed possible to further optimize SP-PIFO design. Furthermore, as new generations of programmable hardware are released, we expect to see new hardware primitives and features which could facilitate PIFO implementations in the near future. For instance, future switches could support an increased number of stages, provide dynamicity in memory access between the ingress and egress pipelines, or provide access to queue information at enqueue.
SP-PIFO: Approximating Push-In First-Out Behaviors using Strict-Priority Queues, NSDI'20.
Santa Clara, CA, USA. February, 2020.
by Albert Gran Alcoz, Alexander Dietmüller, Laurent Vanbever