Data Race Detection in the Linux Kernel with CPALockator Electronic Communications of the EASST Volume 079 (2020) Interactive Workshop on the Industrial Application of Verification and Testing ETAPS 2020 Workshop (InterAVT 2020) Data Race Detection in the Linux Kernel with CPALockator Pavel Andrianov and Vadim Mutilin 8 pages Guest Editors: Stylianos Basagiannis, Goetz Botterweck, Anila Mjeda ECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122 http://www.easst.org/eceasst/ ECEASST Data Race Detection in the Linux Kernel with CPALockator Pavel Andrianov1 and Vadim Mutilin2 1,2Ivannikov Institute for System Programming of the RAS 2Moscow Institute of Physics and Technology Abstract: Most of the state-of-the-art verification tools do not scale well on com- plicated software. Our goal was to develop a tool, which becomes a golden mean between precise and slow software model checkers and fast and imprecise static ana- lyzers. It allows verifying industrial software more efficiently. Our method is based on the Thread-Modular approach elaborating the idea of abstraction from precise thread interaction and considering every thread separately, but in a special environ- ment, which models thread effects on each other. The approach was implemented in the CPAchecker framework and was evaluated on benchmarks based on Linux device drivers for data race detection. It demonstrated that predicate abstraction al- lows keeping a false alarms rate at a reasonable level of 52%. Moreover, it did not miss known real bugs found by analysis of commits in the Linux kernel repository thus confirming the soundness of the approach. Keywords: Data race, Thread-Modular approach, Linux kernel. 1 Introduction Multithreaded software is widely spread nowadays for efficient usage of multiple CPU cores. That is also correct for operating systems, which may contain a lot of parallel activities, for example, system calls, interrupt handlers and so on. In addition to generic bugs, which are common for all kinds of software, multithreaded pro- grams may contain specific ones, related particularly with a parallel execution: deadlocks and races. Data races are an important subclass of race conditions and are defined as a situation when a simultaneous accesses to the same memory take place from different threads, where one of the accesses is a write. In general, a data race may not lead to failure directly, but it is a symptom of an error. There were developed a lot of different approaches for data race detection. Usually, they are divided into static and dynamic ones. Further, we will concentrate mostly on static approaches, which may guarantee the correctness of the code under certain assumptions. Precise approaches to model checking are based on considering all possible thread interleav- ings. Thus, they allow to be very precise and consider different complicated cases, like lock-free synchronization and weak memory models. As a drawback, they have spent a lot of resources (time and memory) per a verification task and thus can not be applied to large benchmarks. An opposite case of static verification is a data flow-based analysis. The corresponding tools are very fast but not so precise as model checkers. Particularly, the simple static analysis can not perform a path sensitive analysis. As a result, there are produced a lot of false alarms and some 1 / 8 Volume 079 (2020) Data Race Detection in the Linux Kernel with CPALockator of the tools apply unsound filters to remove a part of false alarms, and it may lead to a missed bug. Our goal was to develop a tool that unites advantages both fast data flow analysis and precise model checking. One of the ideas to build such an approach is a sequential combination of two stages: the first stage is an imprecise analysis, which produces hints for the second stage, which is a traditional model checking. The other idea is a thread-modular approach, which considers each thread separately, but in a special environment [HJMQ03, GPR11]. We implemented a thread-modular approach extended with projections in the CPAchecker framework 1. CPAchecker contains different kinds of analysis (CPAs), which may be combined. We extended several CPAs for support of the thread-modular approach with projections and implemented two specific CPAs: ThreadCPA for thread analysis and LockCPA for tracking syn- chronization primitives. We successfully applied the CPALockator tool to a set of benchmarks that are based on the Linux device drivers. The final rate of false alarms is quite low – 52%. We performed an analysis of a set of existing fixes in Linux drivers, and the tool can detect 5 of 8 known bugs. The rest of them are missed due to timeout. 2 Basic Idea of the Thread-Modular Approach with Projections Consider a simple program, which has only two active threads. Figure 1 presents a simple model example, which uses an ad-hoc synchronization. The first thread initializes global data (in this case, a global variable g) and then sets a flag, meaning the shared data are ready. The second thread is allowed to use the shared data only after setting the flag. Thus, there is no data race on a global variable g. volatile int g = 0; Thread1 { 1: g = 1; 2: d = 1; 3: ... } volatile int d = 0; Thread2 { 4: if (d == 1) { 5: g = 2; 6: } } Figure 1: An example of a small program A suggested approach is based on a thread-modular approach. The approach considers each thread separately but in a special environment, which is constructed also during the analysis. The environment computation is based on the analysis of all threads, as every thread is a part of the environment for other threads. For each thread, a set of its actions, which may affect other threads, is collected. The actions include modification of shared variables, acquiring synchro- nization primitives and so on. Environment precision strongly affects the precision of the whole analysis. However, there is the main question, how to compute and represent the environment efficiently. In a sequential analysis, there is a successful technique, which allows reducing the number of considered program states – an abstraction. It allows to abstract from minor details of a 1 https://cpachecker.sosy-lab.org/ InterAVT 2020 2 / 8 ECEASST program and considers general (abstract) state. Each abstract state may correspond to a set of real (concrete) program states. This idea allows to significantly increase the efficiency of the analysis. The key idea of the suggested approach is an extension of abstraction not only to the program states but also to the operations of a thread. Adjusting the level of the abstraction, it is possible to choose a balance between speed and precision of the tool. Figure 2 shows a part of the Abstract Reachability Graph (ARG) for the first and the second threads from figure 1. There is no interaction considered, so this is not a final step of the analysis. g → 0 d → 0 ![d == 1] g → 0 d → 0 g = 1 g → 1 d → 0 d = 1 g → 1 d → 1 Thread 1 Thread 2 g → 0 d → 0 #A1: #A2: #B1: Figure 2: Abstract transitions for two threads without any interaction The analysis in the example is based on a simple Value Analysis, which tracks only explicit values of variables. A transition contains an abstract state and an abstract operation. The first abstract state in both threads contains information that both global variables (d and g) are equal to zero (g → 0). After performing an operation g = 1 (transition #A1) the value of g is updated into 1 (g → 1) in the second abstract state (transition #A2). After constructing an ARG for two threads separately, we need to consider the influence of threads to each other, i.e. to construct an environment. For every thread operation, we compute its projection – a representation of operation in a thread for other threads as an environment. For example, modification of local variables can not affect on other threads, so the corresponding projection is empty. Modification of a global variable may affect other threads, so the projection may be equal to the original transition or overapproximate it, for example, by abstraction from a precise assigned value. A projection may contain not only information about action but also a condition for performing this action, so-called guard. Consider a transition #A1. We may represent the corresponding projection in the following way: if the value of g equals zero, it may be changed to one. In other words, the projection consists of two parts: a guard ([g == 0]) and an action (g → 1). The guard corresponds to a predecessor abstract state and an action corresponds to an operation. The figure 3 presents computed projections for the first thread. There are two transitions, which may affect the global variables: #A1 and #A2. We compute the corresponding projections: #P1 and #P2. Then every projection has to affect the second thread, i.e. apply to all possible (according to the guard) transitions of the second thread. We apply the projection #P1 to the transition #B1, as the state is compatible with the guard of the projection. Only after that, we may apply the projection #P2 to the new transition #B2, which requires g to be equal to one. And only then the second thread may go through a new transition #B3, which discovers new paths. Note, the figure presents only projections for the first thread, in complete ARG there should be also projections for the second thread as well. For data race detection we have to find two transitions, which modify the same variable. The 3 / 8 Volume 079 (2020) Data Race Detection in the Linux Kernel with CPALockator g → 0 d → 0 g = 1 g → 0 d → 0 ![d == 1] g → 0 d → 0 g = 1 g → 0 d → 0 g = 1 g → 1 d → 0 d = 1 g → 1 d → 1 Thread 1 Thread 2 g → 0 d → 0 Projection g → 1 d → 0 ![d == 1] g → 1 d → 0 g → 1 d → 0 d = 1 g → 1 d → 0 d = 1 g → 1 d → 1 [d == 1] g → 1 d → 1 g = 2 #A2: #A1: #B1: #B2: #B3: #B4: #P2: #P1: Figure 3: Application of projections to the second thread example has potential candidates: #A1 and #B4. Now, we should check, if the two transitions may be executed in parallel. That means, that the corresponding abstract states must be a part of one global state, i.e. they must be compatible. In this case, the partial states are contradicting each other, as one has g → 0 and the other g → 1. So, the corresponding transitions can not be executed simultaneously. Thus, we conclude there is no data race for g. Note, there is a data race for d (transitions #A2 and #B2), but it may be considered as lock-free synchronization, and a potential race is a part of its implementation. The suggested approach provides a lot of possible options and configurations for targeting to a particular task. A projection may be represented by more or less precise abstraction. Several projections may be joined altogether or considered separately. An example of more abstract transitions is presented in figure 4. g → 0 d → 0 g = 1 d = 1 g → 0 d → 0 ![d == 1] g → 0 d → 0 g = 1 g → * d → 0 g = 1 d = 1 g → 1 d → 0 d = 1 g → 1 d → 1 Thread 1 Thread 2 g → 0 d → 0 Projection g → * d → * [d == 1] g → * d → * g = 2 ⋁ ⋁ #A1: #A2: #B1: #C2: #C3: #P3: Figure 4: Application of projections to the second thread The two initial projections (#P1 and #P2) are joined into a single one (#P3). Usually, it leads to losing some information, for example, here we lost a precise value of variable g. The action of projection also becomes more complicated. The second thread can not identify a precise value of the variables, as both of the variables are now equal to zero or one. The simple kind of analysis operates only with single explicit values of variables and both variables are considered to be equal to any random value. Now, transitions #A1 and #C3 become compatible, which means, InterAVT 2020 4 / 8 ECEASST that the race is reported on variable g. The level of abstraction strongly affects the precision of the analysis and its speed, but the analysis always remains sound. 3 Data-Race Detection Algorithm Data race detection algorithms may be divided into two stages: reachable states analysis and searching of pairs, which can form a race. In our case, the first subtask is related to ARG construction and may be solved with different configurations. As we have already discussed, an ARG from abstract transitions is constructed with the chosen configuration of the CPALockator. Note, the abstract transitions in the graph are reachable with a certain abstraction level, so, they may not be reachable in a real execution. For refinement of the abstraction, we use a CEGAR algorithm (Counterexample Guided Abstraction Refinement). Note, a CEGAR algorithm was reused without significant modifications. However, it allows refining only path in a single thread, i.e. it can not identify contradictions between operations in different threads. Anyway, it is not a fundamental limitation of the approach, and a possible extension of the CEGAR algorithm for the thread-modular approach will allow obtaining more precise results. After computation of the ARG we should find those transitions, which could form a data race. Usually, a race condition is defined as a situation, where simultaneous access to shared memory from different threads takes place. Two main questions, which appear for static data race detection, are the following: how we can determine, that the accesses are performed to the same memory and how to determine if the accesses are simultaneous. Further, we will discuss both of the features of the approach. In real software, there are a lot of different operations with pointers, structures, and more complicated data. The variety of pointers breaks any kind of alias analysis. Moreover, most of the approaches require the pointers to be correctly initialized, which is difficult to reach while analyzing libraries, modules and so on. Thus, we use a BnB memory model [AFM+17], which divides all memory into a disjoint set of regions. Every region corresponds to a specific data type or a structure field (if an address-of expression is not used for it). According to the memory model, we consider access to the same region as access to the same memory. The BnB memory model has certain limitations. First of all, casting and address arithmetic may lead to a missed bug, because the two accesses will not be placed in the same region. Also, the memory model may lead to false alarms in cases, when two pointers of the same type never point to the same memory but are placed in the same region. If we have two accesses to the same region, we have to check, if two accesses are taking place simultaneously. We use the already mentioned notion of compatibility of the corresponding transitions. Two partial states are compatible, if they may be parts of a single global state. If the partial states are not compatible, that means, the corresponding transitions can not be executed in parallel. Thus, this approach is an extension of a Lockset algorithm, which defines a data race, as a pair of accesses with a disjoint set of locks. A compatibility check uses different kinds of analysis, including analysis of synchronization primitives, predicate analysis, thread analysis, hence the approach is more precise than the default Lockset algorithm. 5 / 8 Volume 079 (2020) Data Race Detection in the Linux Kernel with CPALockator 4 Evaluation on the Linux Device Drivers CPALockator 2 tool was evaluated on a benchmark set, which is based on device drivers of the subsystem drivers/net of the Linux kernel v4.2.6 3. For each kernel module, a verification task was prepared with the help of the Klever framework [NZ18]. For 473 modules it prepared 425 tasks for CPALockator. Detailed information about its verdicts is presented in the table 1. The table 2 presents the results of the analysis of warnings. We will discuss them further. Tasks Description 261 Safe (no bugs) 22 Unsafe (races found) 142 Unknow (timeout) Table 1: Overall verdicts Number Percent Description 41 48% True (bugs) 25 29% Imprecise scenario model 8 9% Imprecise memory model 7 8% Specifics of interrupts 4 5% Other imprecisions in the analysis Table 2: Analysis of warnings in Linux kernel modules 41 warnings of 85 (48%) correspond to real bugs. 10 of them are related to so-called “benign” races. It is, for example, a race on statistics counter or during printing debug information. The main part of bugs corresponds to a situation when a new thread is created while an initialization process is still uncompleted. Thus, the new thread may access uninitialized memory or not al- located memory. The situation usually occurs, when registration of driver handlers is performed before the complete initialization of driver data. Note, several warnings may be outputted per one module and these 41 warnings correspond to 15 modules. 44 of 85 warnings (52%) are false alarms. The main part of them (25 warnings) is related to the imprecise communicating scenario model, which defines scenarios of driver activities. The issue is not related to the CPALockator tool and is caused by an internal part of the Klever framework. So, the corresponding warning is correct for the verification tool, but the data race is impossible in a real execution. For example, an imprecise communicating scenario model may consider some driver activities as parallel, but in a real execution, the corresponding scenario is impossible. Other examples – absent models of external functions, which are important for analysis, or incorrect driver input data are also a responsibility of Klever task generator. The main part of false alarms, related to the CPALockator tool, is due to imprecise memory model – 8 warnings. To be conservative CPALockator may consider different memory as the same (BnB model) and produce a warning about potential simultaneous access, although the accesses are not to the same memory. This is mostly happening for structure fields. One more important reason of false alarms is low-level constructions in the Linux kernel. Currently, an interrupt handler is considered to be executed like an ordinary thread, but actually, it has certain specifics. For example, the execution of the interrupt handler may be interrupted only in special cases and only by another interrupt. It means, that the interrupt handler is executed atomically in parallel with an ordinary thread. Moreover, the registration of an interrupt handler may not mean that it becomes active. In some cases interrupts should be activated in the device. 2 CPAchecker-theory-with-races@32609 3 https://gitlab.com/sosy-lab/software/ldv-benchmarks.git , directory linux-4.2.6-races InterAVT 2020 6 / 8 ECEASST Small classes of false alarms are dedicated to the analysis imprecision: shared analysis, prob- lems with complex data structures and model variables. 5 Evaluation on the Set of Known Bugs Analysis of causes of false alarms was performed on a set of benchmarks, which are based on fixes of existing bugs in stable Linux kernel versions in 2014 4. 795 commits are extracted using keywords5 from the whole set of 4047 commits. Then we performed a manual analysis of commits and filtered only those commits, which are related to data races and contain only lock- based synchronization primitives and are in kernel modules, see details in the table 3. We have got 13 modules after that and launched the Klever framework. It failed to prepare verification tasks for 5 modules, hence the resulting set contains 8 verification tasks6. Table 3: Constructing a set of commits with known bugs for evaluation Total amount Description 4047 All commits 795 Extracted using keywords 43 Related only to data races 28 Use lock-based synchronization 13 In kernel modules 8 Klever prepared corresponding tasks The results of launching CPALockator on 8 verification tasks with known bugs are presented in the table 4. The bugs are marked by commit identifiers. We run CPALockator on the tasks prepared by Klever framework for the repository before corresponding commit. For checking that the bug is not found after the fix we also launched CPALockator on a verification task prepared for the repository after corresponding commit. Table 4: Evaluation on known bugs in stable versions of Linux kernel Commit Module Result Comment 0e2400e drivers/char/virtio console.ko + 7357404 fs/hfsplus/hfsplus.ko ± Found with limited CEGAR iterations f1a8a3f drivers/net/bonding/bonding.ko ∓ Found a nontarget bug with limited CEGAR iterations 1a81087 net/ipv4/tcp illinois.ko ± Nontarget bug was found f0c626f drivers/target/iscsi/iscsi target mod.ko ∓ Found a nontarget bug with limited CEGAR iterations aea9dd5 fs/btrfs/btrfs.ko − 10ef175 sound/soc/snd-soc-core.ko − Timeout 4036523 drivers/gpu/drm/i915/i915.ko − • CPALockator found two bugs fixed in commits 0e2400e and 7357404. For the tasks pre- pared after the fix, the corresponding warnings are not found. It means that the bugs are detected correctly. The bug in 7357404 was found only if the CEGAR iterations are lim- ited, in another case, there will be a timeout. 4 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git 5 fix, error, race, bug, failure, crash, etc 6 https://gitlab.com/sosy-lab/software/ldv-benchmarks.git , directory ldv-commit-races 7 / 8 Volume 079 (2020) Data Race Detection in the Linux Kernel with CPALockator • For three more modules, CPALockator outputted warnings for the same memory location mentioned in the bug but for a different access location in the code. We call them nontarget bugs. For f1a8a3f and f0c626f the CEGAR iterations also must be limited to avoid the timeout. • Three modules are too large and too complex and exceed 15 minutes timelimit. Note, the timeout is achieved in the first iteration of CEGAR. Thus, 5 of 8 bugs were found, and for the rest 3 modules, there were unknown verdicts. The results confirm the soundness of the approach. 6 Conclusion One of the main limitations of the approach is a size of verified code and its complexity. We do not discuss it much, as it is well known for any static verifier tool. However, CPALockator is able to solve real-world tasks, which are based on Linux device drivers, which shows practical benefit of the tool. We also mentioned BnB memory model, which also affects on the verification process. This is a trade off between speed of the analysis and its precision. One more limitation of the approach is support only lock-based synchronization primitives. Different atomic constructions, barriers are skipped for now. We do not speak about function pointers, C constructions and so on, as it is not related to the approach itself. CPAchecker framework contain different kinds of analysis, for example, function pointer analysis, and they may be included into CPALockator if it is necessary. The results on Linux device drivers are quite good, as the tool demonstrates a false alarms rate of 52%. Moreover, it does not miss real bugs on our benchmark set, that confirms the soundness of the approach. Thus, we may conclude that the tool is practically valuable for data race detection. Future plans include further improvements in the overall approach: support new synchroniza- tion primitives, investigate different combination with other analyzes. Bibliography [AFM+17] P. Andrianov, K. Friedberger, M. Mandrykin, V. Mutilin, A. Volkov. CPA-BAM- BnB: Block-Abstraction Memoization and Region-Based Memory Models for Pred- icate Abstractions. In Proceedings of TACAS. Pp. 355–359. 2017. [GPR11] A. Gupta, C. Popeea, A. Rybalchenko. Threader: A Constraint-based Verifier for Multi-threaded Programs. In Proceedings of CAV. Pp. 412–417. Springer, 2011. [HJMQ03] T. A. Henzinger, R. Jhala, R. Majumdar, S. Qadeer. Thread-Modular Abstraction Refinement. In Proceedings of CAV. Pp. 262–274. Springer, 2003. [NZ18] E. Novikov, I. Zakharov. Verification of Operating System Monolithic Kernels With- out Extensions. In Leveraging Applications of Formal Methods, Verification and Validation. Industrial Practice. Pp. 230–248. Springer, 2018. InterAVT 2020 8 / 8 Introduction Basic Idea of the Thread-Modular Approach with Projections Data-Race Detection Algorithm Evaluation on the Linux Device Drivers Evaluation on the Set of Known Bugs Conclusion