User Space Packet Schedulers: Towards Rapid Prototyping of Queue-Management Algorithms


Electronic Communications of the EASST
Volume 080 (2021)

Conference on Networked Systems 2021
(NetSys 2021)

User Space Packet Schedulers: Towards Rapid Prototyping of
Queue-Management Algorithms

Ralf Kundel, Paul Stiegele, Dat Tran, Julian Zobel, Osama Abboud,
Rhaban Hark, Ralf Steinmetz

4 pages

Guest Editors: Andreas Blenk, Mathias Fischer, Stefan Fischer, Horst Hellbrueck, Oliver
Hohlfeld, Andreas Kassler, Koojana Kuladinithi, Winfried Lamersdorf, Olaf Landsiedel, Andreas
Timm-Giel, Alexey Vinel

ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122

http://www.easst.org/eceasst/


ECEASST

User Space Packet Schedulers: Towards Rapid Prototyping of
Queue-Management Algorithms

Ralf Kundel1, Paul Stiegele1, Dat Tran1, Julian Zobel1, Osama Abboud2,
Rhaban Hark1, Ralf Steinmetz1

ralf.kundel@kom.tu-darmstadt.de
1 Multimedia Communications Lab (KOM)

Technical University of Darmstadt, Germany
2 Huawei Technologies Duesseldorf GmbH, Germany

Abstract: Quality of Service indicators in computer networks reached tremendous
importance over the last years. Especially throughput and latency are directly influ-
enced by the dimension of packet queues. Determining the optimal dimension based
on the inevitable tradeoff between throughput and latency tends to be a hard, almost
infeasible challenge. Several algorithms for Active Queue Management have been
proposed to address this challenge over the last years. However, the deployment
and by that the development of such algorithms is challenging as they are usually
located within the operation systems’ kernel or implemented in fixed hardware. In
this work, we investigate how novel algorithms can be deployed in user space for
rapid prototyping with tolerable effort. We provide core performance characteristics
and highlight the viability and reasonability of this approach.

Keywords: AQM, User Space, Congestion Control, Bufferbloat

Introduction, Background, and Related Work: Internet applications have experienced an
enormous upswing, most notably through intensive video streaming and conferencing applica-
tions realized by congestion-controlled TCP. Besides steadily increasing requirements regarding
throughput and availability, especially latency became more and more important as a Quality of
Service (QoS) criterion. However, there is a tradeoff between high bandwidth utilization and low
latency packet forwarding for congestion-controlled flows when determining the dimension of
packet queues on forwarding routers.

Throughput-intensive applications require sufficiently large packet queues in order to entirely
utilize a link, whereas latency-sensitive applications suffer from too large packet queues, known
as bufferbloat phenomenon [GN12]. The optimal queue size in terms of maximizing throughput
without unnecessary increase in latency is considered to be B = RTT·C√n , where B is the maximum
queue size, C the bottleneck link speed, and RT T the Round Trip Time of n TCP flows [AKM04].
However, this formula and its application on packet queue engineering has several challenges.
First, this “optimal” queue size primarily focuses on full utilization of the subsequent bottleneck
link and only secondary on latency reduction. Second, constantly having n congestion-controlled
TCP flows with a known RT T is not realistic as Internet traffic changes dynamically and numer-
ous different congestion control algorithms are present in parallel [KWM+19]. Third, this rule
of thumb considers only long data transmissions, so called “elephant flows”, although there is
always a mixture of long and short flows in real networks.

1 / 4 Volume 080 (2021)

mailto:ralf.kundel@kom.tu-darmstadt.de


User Space Packet Schedulers

To this end, numerous general but advanced Active Queue Management (AQM) algorithms
have been proposed [Ada13][KBV+18]. They tackle the problem of bufferbloat by intelligently
dropping packets or marking them with a congestion notification bit. However, no algorithm
fitting all use cases has been found to this point and it is unlikely to exist. Certainly, the devel-
opment and experimental evaluation of such algorithms is challenging as they are either located
within the operating system kernel, e.g., the Linux kernel, or within hardware of packet forward-
ing chips (i.e., ASICs).

With this work, we propose the idea of packet queuing and scheduling in the user space for
rapid prototyping and preliminary evaluation of novel AQM algorithms. Note that congestion
control experiments are usually not performed by simulations, due to an easier reproduction
of complexity and heterogeneity of real computer networks in emulation environments, mainly
Mininet. Hence, our work focuses on network emulations.

The most related approach to this work is the Linux kernel extension libnetfilter queue1. This
library enables a hybrid coexistence of queues in the kernel space and a packet scheduler in the
user space. The scheduler decides whether a packet is dropped or not based on packet metadata
information. However, advanced queuing and scheduling structures exceed the API capabilities,
e.g., front drop instead of tail drop or Push-In-Extract-Out queues cannot be investigated.

Prototype and Preliminary Evaluation Results: In order to realize a user space queue
emulator, we first investigate which programming languages should be used with regard to their
packet I/O performance. For that, we create a simple Mininet topology h1 ←→ h2 ←→ h3 con-
sisting of three hosts. Host h1 and h3 perform a bandwidth performance test with the IPerf3
tool. Host h2 is responsible to forward all packets from h1 to h3 and vice versa without queuing.
To evaluate the performance of the packet forwarding, we measure the throughput of the IPerf3
TCP flow and the observed round trip time (RTT). Performance characteristics for six different
packet forwarding implementations are displayed in Figure 1.

We use native Linux kernel and the userspace implementation in C as baseline. Forward-
ing incoming packets based on the Linux kernel’s IPv4 routing table represents a theoretically
achievable upper bound for throughput and a lower bound for RTT. The low-level user space
implementation in C, compiled with -lpcap flags and no further optimizations, is supposed to be
a even better upper bound for throughput. We realized four implementations in different pro-
gramming languages with similar behavior to the baseline designs for the user space queuing
system. Note that the user space implementations are built upon a raw socket kernel module
and thus the kernel is still involved. They are implemented as multi-threaded applications and
use raw sockets to receive and send packets. One thread receives packets on the first interface
and immediately sends them out on the second one. The second thread is responsible for simi-
lar packet forwarding in the opposite direction. More threads per direction are only meaningful
if a hardware network interface card with load balancer is in use. The third and main thread
is responsible to monitor these two workers. This multi-threading approach achieves almost a
twofold increase in forwarding performance compared to a single-threaded approach.

As depicted in Figure 1, within the user space implementations, Go achieves the best RTT. The
C user space baseline-implementation, in contrast, achieves the best throughput, but surprisingly
also experiences a significantly higher RTT. This is caused by a different interrupt handling

1 The netfilter.org libnetfilter queue project. www.netfilter.org/projects/libnetfilter queue/. Accessed: 2020-10-13.

NetSys 2021 2 / 4

www.netfilter.org/projects/libnetfilter_queue/


ECEASST

Linux 
kernel

C Go Rust Python2 Python3
0

1000

2000

3000

4000

5000

6000

7000

TC
P 

Ra
te

 [M
bi

t/s
]

Linux 
kernel

C Go Rust Python2 Python3
0

1

2

3

4

5

6

O
bs

er
ve

d 
RT

T 
[m

s]

Figure 1: User space packet forwarding characteristics with disabled checksum offloading. The
Linux IPv4 forwarding is by default part of the Linux kernel and represents an upper bound. All
tests are compiled and running on a Intel Xeon D-1541 with Ubuntu 16.04 (kernel 4.4.0).

structure within the C raw socket implementation. Note that this comparison is an evaluation of
the raw socket performance within these languages only and not a benchmark of the languages
itself. In total, Go performed best considering both metrics of the four high-level languages.

As prototyping in the queueing system should be as easy as possible, a high-level language
such as Go, Rust or Python is pursued instead of low-level C. As it performed best in our preced-
ing tests, we decided to implement the user space queuing system in Go, including queues with
exchangeable schedulers and AQM algorithms. To showcase the benefits of such a user space
queuing system, the prototype was integrated into the previous Mininet topology, as depicted
in Figure 2a. For that, two experiments with different queue behavior were performed using (1)
a tail drop queue and (2) a front drop queue with a rate limit of 50 Mbit/s . In case of a full
queue, when packet loss is unavoidable on receiving another packet, the former approach drops
the new packet, while the latter drops the first in the queue and then adds the new packet at the
tail. As the congestion control detects the packet loss earlier in the latter case, we assume a
different behavior in both scenarios which shall be further investigated.

Exemplary results for both approaches are compared in Figure 2b. In this concrete case,

h1 Queueing System

fx gx
h3

Mininet Network

raw
socket

raw
socket

kernel space user space

h2

(a)

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
0

25

Ta
il 

D
ro

p

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
time [s]

0

50

Fr
on

t D
ro

p

(b)

Figure 2: Results of an exemplary user space queuing experiment. (a) Integration of queueing
system into simple Mininet topology. (b) Packet latency for dropping packets in the front/tail of
the queue while overflowing. Red bars mark dropped packets (22 tail drops, 15 front drops).

3 / 4 Volume 080 (2021)


User Space Packet Schedulers

we observe that the front drop queue creates less packet loss—15 compared to 22 lost packets—
because TCP congestion control receives feedback earlier and thus adapts the sending rate sooner.
This quickly realized experiment in combination with easily generated but yet expressive results
highlight the major advantage of our user space queue emulator. In contrast to kernel or hard-
ware implementations, the user space allows to log all dropping events. In essence, it is easy
and convenient to determine why, where, and when a packet is dropped using user space packet
schedulers for rapid prototyping.

Conclusion and Outlook: Active Queue Management is highly important in modern net-
works in order to meet the growing Quality-of-Service requirements. However, their develop-
ment and evaluation is challenging due to their typical implementation in the kernel or in hard-
ware. We therefore propose the idea of user space queuing systems for rapid prototyping of novel
AQM algorithms. We performed experiments comparing four different high level programming
languages for their applicability to implement such a system. The results have shown that raw
sockets and user space packet queues achieve bandwidths above 1 Gbit/s, which is sufficient for
most AQM-experiments. Additionally, two simple queuing algorithms were evaluated using a
prototypical implementation of our user space queuing system. Because AQM prototyping in
user space allows to conveniently test and evaluate novel AQM approaches, future work will
focus on the realization of programmable user space queues in real computer networks based on
high performance packet I/O frameworks together with high-level programming languages.

Acknowledgements: This work has been supported by the Federal Ministry of Education and
Research (BMBF, Germany) within the Software Campus Project ”5G-PCI” and in parts by the
German Research Foundation (DFG) as part of the projects B1 and C2 within the Collaborative
Research Center (CRC) 1053 MAKI and the LOEWE initiative (Hessen, Germany) within the
project Nature 4.0 and the EmergenCity Centre.

Bibliography

[Ada13] R. Adams. Active Queue Management: A Survey. IEEE Communications Surveys
Tutorials 15(3):1425–1476, 2013.

[AKM04] G. Appenzeller, I. Keslassy, N. McKeown. Sizing Router Buffers. In Conference
on Applications, Technologies, Architectures, and Protocols for Computer
Communications. P. 281–292. ACM, 2004.

[GN12] J. Gettys, K. Nichols. Bufferbloat: dark buffers in the internet. Communications of
the ACM 55(1):57–65, 2012.

[KBV+18] R. Kundel, J. Blendin, T. Viernickel, B. Koldehofe, R. Steinmetz. P4-CoDel: Ac-
tive Queue Management in Programmable Data Planes. In Proceedings of the
Conference on Network Function Virtualization and Software Defined Networks
(NFV-SDN). Pp. 1–4. 2018.

[KWM+19] R. Kundel, J. Wallerich, W. Maas, L. Nobach, B. Koldehofe, R. Steinmetz. Queue-
ing at the Telco Service Edge: Requirements, Challenges and Opportunities. In
Workshop on Buffer Sizing. Stanford, US, 2019.

NetSys 2021 4 / 4