Scalable symbolic execution of systems code

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Published Date

Publisher

Abstract

Systems code, such as a CPU emulator or OS kernel, is pervasive and its quality is critical for all the software built on top of it. To monitor the code quality of systems code, automated program analysis is desirable as an assistant for human experts considering the size and complexity of the code. Compared to other automated program analysis techniques, symbolic execution can reason about more complicated code, but it is more difficult to scale to large systems. To take advantage of symbolic execution while circumventing the scalability issue, we combine symbolic execution with other techniques. We also explore solutions for the path explosion problem which is the major cause of this issue. Software CPU emulators are difficult to implement correctly and extensive testing is desirable. Since a large number of test cases are required for full coverage, it is important to execute tests efficiently. We explore techniques for combining many instruction tests into one program to amortize overheads such as booting an emulator. To ensure the results of each test are reflected in a final result, we use the outputs of one instruction test as an input to the next, and adopt the ``Feistel network'' construction from cryptography so that each step is invertible. We evaluate this approach by applying it to PokeEMU, a tool that generates emulator tests using symbolic execution. The combined tests run much faster, but still reveal most of the same behavior differences as when run individually. Loop misuse is one of the major sources of real world program bugs. However, symbolic execution of programs with loops can be challenging to scale due to the large number of execution paths introduced by loops. To address this issue, a previous loop summarization approach has been developed on a concolic system that is not publicly available. To further evaluate this approach and combine it with other techniques, we redesign and implement this approach on FuzzBALL, a publicly available symbolic execution system. According to our evaluation, this approach can summarize linear loops, but is limited for non-linear or nested loops and loops manipulating symbolic arrays. Based on the loop summarization implementation, we implement a prototype of function summarization which currently supports strlen and its variations. Compared to existing function summarization approaches, our approach can identify functions more accurately without the help of debug information. A kernel is the most important OS component that supports both the entire OS and user-space applications, while it is challenging to monitor code correctness of a kernel. As the last part of this thesis, we propose a novel approach to confirm candidate bugs of the Linux kernel found by static analysis, which combines fuzzing with symbolic execution. Given code locations of a candidate bug, we start with fuzzing to collect inputs that can get close to those locations. We then perform concolic execution directed by those inputs, and switch to symbolic execution when getting close to the bug. We implement a prototype of this approach, and evaluate it with known bugs. Unfortunately, we are not able to find a working example due to various limitations. By analyzing the results, we identify various reasons for the failed experiment and propose improvements for this approach.

Description

University of Minnesota Ph.D. dissertation. December 2024. Major: Computer Science. Advisor: Stephen McCamant. 1 computer file (PDF); vii, 59 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Yan, Qiuchen. (2024). Scalable symbolic execution of systems code. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/271690.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.