Browsing by Author "Staats, Matt"
Now showing 1 - 16 of 16
- Results Per Page
- Sort Options
Item A flexible and non-intrusive approach for computing complex structural coverage metrics(ACM, 2015) Whalen, Michael; Person, Suzette; Rungta, Neha; Staats, Matt; Grijincu, DaniellaSoftware analysis tools and techniques often leverage structural code coverage information to reason about the dynamic behavior of software. Existing techniques instrument the code with the required structural obligations and then monitor the execution of the compiled code to report coverage. Instrumentation based approaches often incur considerable runtime overhead for complex structural coverage metrics such as Modified Condition/Decision (mcdc). Code instrumentation, in general, has to be approached with great care to ensure it does not modify the behavior of the original code. Furthermore, instrumented code cannot be used in conjunction with other analyses that reason about the structure and semantics of the code under test. In this work, we introduce a non-intrusive preprocessing approach for computing structural coverage information. It uses a static partial evaluation of the decisions in the source code and a source-to-bytecode mapping to generate the information necessary to efficiently track structural coverage metrics during execution. Our technique is flexible; the results of the preprocessing can be used by a variety of coverage-driven software analysis tasks, including automated analyses that are not possible for instrumented code. Experimental results in the context of symbolic execution show the efficiency and flexibility of our non-intrusive approach for computing code coverage information.Item Automated Oracle Creation Support, or: How I Learned to Stop Worrying About Fault Propagation and Love Mutation Testing(IEEE Press, 2012) Staats, Matt; Gay, Gregory; Heimdahl, MatsIn testing, the test oracle is the artifact that determines whether an application under test executes correctly. The choice of test oracle can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support the selection of test inputs, little work exists for supporting oracle creation. In this work, we propose a method of supporting test oracle creation. This method automatically selects the oracle data - the set of variables monitored during testing - for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experiments over four industrial examples demonstrate that our method may be a cost-effective approach for producing small, effective oracle data, with fault finding improvements over current industrial best practice of up to 145.8% observed.Item Automated Oracle Data Selection Support(IEEE, 2015-06) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, MatsThe choice of test oracle—the artifact that determines whether an application under test executes correctly—can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support test input selection, little work exists for supporting oracle creation. We propose a method of supporting test oracle creation that automatically selects the oracle data—the set of variables monitored during testing—for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experimental results obtained by employing our method over six industrial systems (while varying test input types and the number of generated mutants) indicate that our method—when paired with test inputs generated either at random or to satisfy specific structural coverage criteria—may be a cost-effective approach for producing small, effective oracle data sets, with fault finding improvements over current industrial best practice of up to 1,435% observed (with typical improvements of up to 50%).Item Automated Oracle Data Selection Support(2015-11) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, MatsThe choice of test oracle—the artifact that determines whether an application under test executes correctly—can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support test input selection, little work exists for supporting oracle creation. We propose a method of supporting test oracle creation that automatically selects the oracle data—the set of variables monitored during testing—for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experimental results obtained by employing our method over six industrial systems (while varying test input types and the number of generated mutants) indicate that our method—when paired with test inputs generated either at random or to satisfy specific structural coverage criteria—may be a cost-effective approach for producing small, effective oracle data sets, with fault finding improvements over current industrial best practice of up to 1,435% observed (with typical improvements of up to 50%).Item Coverage Metrics for Requirements-Based Testing: Evaluation of Effectiveness(NASA, 2010) Staats, Matt; Whalen, Michael; Rajan, Ajitha; Heimdahl, MatsIn black-box testing, the tester creates a set of tests to exercise a system under test without regard to the internal structure of the system. Generally, no objective metric is used to measure the adequacy of black-box tests. In recent work, we have proposed three requirements coverage metrics, allowing testers to objectively measure the adequacy of a black-box test suite with respect to a set of requirements formalized as Linear Temporal Logic (LTL) properties. In this report, we evaluate the effectiveness of these coverage metrics with respect to fault finding. Specifically, we conduct an empirical study to investigate two questions: (1) do test suites satisfying a requirements coverage metric provide better fault finding than randomly generated test suites of approximately the same size?, and (2) do test suites satisfying a more rigorous requirements coverage metric provide better fault finding than test suites satisfying a less rigorous requirements coverage metric? Our results indicate (1) that test suites satisfying more rigorous coverage metrics provide better fault finding than test suites satisfying less rigorous coverage metrics and (2) only one coverage metric proposed—Unique First Cause (UFC) coverage—is sufficiently rigorous to ensure test suites satisfying the metric outperform randomly generated test suites of similar size.Item Moving the Goalposts: Coverage Satisfaction is Not Enough(ACM, 2014) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, MatsStructural coverage criteria have been proposed to measure the adequacy of testing efforts. Indeed, in some domains--e.g., critical systems areas--structural coverage criteria must be satisfied to achieve certification. The advent of powerful search-based test generation tools has given us the ability to generate test inputs to satisfy these structural coverage criteria. While tempting, recent empirical evidence indicates these tools should be used with caution, as merely achieving high structural coverage is not necessarily indicative of high fault detection ability. In this report, we review some of these findings, and offer recommendations on how the strengths of search-based test generation methods can alleviate these issues.Item Observable Modified Condition/Decision Coverage(IEEE, 2013) Whalen, Michael; Gay, Gregory; You, Dongjiang; Heimdahl, Mats; Staats, MattIn many critical systems domains, test suite adequacy is currently measured using structural coverage metrics over the source code. Of particular interest is the modified condition/decision coverage (MC/DC) criterion required for, e.g., critical avionics systems. In previous investigations we have found that the efficacy of such test suites is highly dependent on the structure of the program under test and the choice of variables monitored by the oracle. MC/DC adequate tests would frequently exercise faulty code, but the effects of the faults would not propagate to the monitored oracle variables. In this report, we combine the MC/DC coverage metric with a notion of observability that helps ensure that the result of a fault encountered when covering a structural obligation propagates to a monitored variable; we term this new coverage criterion Observable MC/DC (OMC/DC). We hypothesize this path requirement will make structural coverage metrics 1.) more effective at revealing faults, 2.) more robust to changes in program structure, and 3.) more robust to the choice of variables monitored. We assess the efficacy and sensitivity to program structure of OMC/DC as compared to masking MC/DC using four subsystems from the civil avionics domain and the control logic of a microwave. We have found that test suites satisfying OMC/DC are significantly more effective than test suites satisfying MC/DC, revealing up to 88% more faults, and are less sensitive to program structure and the choice of monitored variables.Item On MC/DC and Implementation Structure: An Empirical Study(IEEE, 2008) Whalen, Michael; Heimdahl, Mats; Staats, Matt; Rajan, AjithaIn civil avionics, obtaining D0-178B certification for highly critical airborne software requires that the adequacy of the code testing effort be measured using a structural coverage criterion known as Modified Condition and Decision Coverage (MC/DC). We hypothesized that the effectiveness of the MC/DC metric is highly sensitive to the structure of the implementation and can therefore be problematic as a test adequacy criterion. We tested this hypothesis by evaluating the faultfinding ability of MC/DC-adequate test suites on five industrial systems (flight guidance and display management). For each system, we created two versions of the implementations—implementations with and without expression folding (i.e., inlining). We found that for all five examples, the effectiveness of the test suites was highly sensitive to the structure of the implementation they were designed to cover. MC/DC test suites adequate on an inlined implementation have greater fault finding ability than test suites generated to be MC/DC adequate on the non-inlined version of the same implementation at the 5% significance level. (The inlined test suites outperformed the non-inlined test suites in the range of 10% to 5940%.) This observation confirms our suspicion that MC/DC used as a test adequacy metric is highly sensitive to structural changes in the implementation, and that test suite adequacy measurement using the MC/DC metric will be better served if done over the inlined implementation.Item On the Danger of Coverage Directed Test Case Generation(Springer-Verlag, 2012) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, MatsIn the avionics domain, the use of structural coverage criteria is legally required in determining test suite adequacy. With the success of automated test generation tools, it is tempting to use these criteria as the basis for test generation. To more firmly establish the eectiveness of such approaches, we have generated and evaluated test suites to satisfy two coverage criteria using counterexample-based test generation and a random generation approach, contrasted against purely random test suites of equal size. Our results yield two key conclusions. First, coverage criteria satisfaction alone is a poor indication of test suite effectiveness. Second, the use of structural coverage as a supplement, not a target, for test generation can have a positive impact. These observations points to the dangers inherent in the increase in test automation in critical systems and the need for more research in how coverage criteria, generation approach, and system structure jointly influence test effectiveness.Item Parallel Symbolic Execution for Structural Test Generation(2010) Staats, Matt; Păsăreanu, CorinaSymbolic execution is a popular technique for automatically generating test cases achieving high structural coverage. Symbolic execution suffers from scalability issues since the number of symbolic paths that need to be explored is very large (or even infinite) for most realistic programs. To address this problem, we propose a technique, Simple Static Partitioning, for parallelizing symbolic execution. The technique uses a set of pre-conditions to partition the symbolic execution tree, allowing us to effectively distribute symbolic execution and decrease the time needed to explore the symbolic execution tree. The proposed technique requires little communication between parallel instances and is designed to work with a variety of architectures, ranging from fast multi-core machines to cloud or grid computing environments. We implement our technique in the Java PathFinder verification tool-set and evaluate it on six case studies with respect to the performance improvement when exploring a finite symbolic execution tree and performing automatic test generation. We demonstrate speedup in both the analysis time over finite symbolic execution trees and in the time required to generate tests relative to sequential execution, with a maximum analysis time speedup of 90x observed using 128 workers and a maximum test generation speedup of 70x observed using 64 workers.Item Partial Translation Verification for Untrusted Code-Generators(2008) Staats, Matt; Heimdahl, MatsWithin the context of model-based development, the correctness of code generators for modeling notations such as Simulink and Stateflow is of obvious importance. If correctness of code generation can be shown, the extensive and often costly verification and validation activities conducted in the modeling domain could be effectively leveraged in the code domain. Unfortunately, most code generators in use today give no guarantees of correctness. In this paper, we investigate a method of leveraging existing model checking tools to verify the partial correctness of code generated by code generators that offer no guarantees of correctness. We explore the feasibility of this approach through a prototype tool that allows us to verify that Linear Temporal Logic (LTL) safety properties are preserved by C code generators for Simulink models. We find that the approach scales well, allowing us to verify that 55 LTL properties are maintained when generating 12,000+ lines of C code from a large Simulink model.Item ReqsCov: A Tool for Measuring Test-Adequacy over Requirements(2008) Staats, Matt; Deng, Weijia; Rajan, Ajitha; Heimdahl, Mats; Woodham, KurtWhen creating test cases for software, a common approach is to create tests that exercise requirements. Determining the adequacy of test cases, however, is generally done through inspection or indirectly by measuring structural coverage of an executable artifact (such as source code or a software model). We present ReqsCov, a tool to directly measure requirements coverage provided by test cases. ReqsCov allows users to measure Linear Temporal Logic requirements coverage using three increasingly rigorous requirements coverage metrics: naive coverage, antecedent coverage, and Unique First Cause coverage. By measuring requirements coverage, users are given insight into the quality of test suites beyond what is available when solely using structural coverage metrics over an implementation.Item Requirements Coverage as an Adequacy Measure for Conformance Testing(Springer-Verlag, 2008) Rajan, Ajitha; Whalen, Michael; Staats, Matt; Heimdahl, MatsConformance testing in model-based development refers to the testing activity that verifies whether the code generated (manually or automatically) from the model is behaviorally equivalent to the model. Presently the adequacy of conformance testing is inferred by measuring structural coverage achieved over the model. We hypothesize that adequacy metrics for conformance testing should consider structural coverage over the requirements either in place of or in addition to structural coverage over the model. Measuring structural coverage over the requirements gives a notion of how well the conformance tests exercise the required behavior of the system. We conducted an experiment to investigate the hypothesis stating structural coverage over formal requirements is more effective than structural coverage over the model as an adequacy measure for conformance testing. We found that the hypothesis was rejected at 5% statistical significance on three of the four case examples in our experiment. Nevertheless, we found that the tests providing requirements coverage found several faults that remained undetected by tests providing model coverage. We thus formed a second hypothesis stating that complementing model coverage with requirements coverage will prove more effective as an adequacy measure than solely using model coverage for conformance testing. In our experiment, we found test suites providing both requirements coverage and model coverage to be more effective at finding faults than test suites providing model coverage alone, at 5% statistical significance. Based on our results, we believe existing adequacy measures for conformance testing that only consider model coverage can be strengthened by combining them with rigorous requirements coverage metrics.Item The Influence of Multiple Artifacts on the Effectiveness of Software Testing(2010) Staats, MattThe effectiveness of the software testing process is determined by artifacts used in testing, including the program, the set of tests, and the test oracle. However, in evaluating software testing techniques, including automated software testing techniques, the influence of these testing artifacts is often overlooked. In my upcoming dissertation, we intend to explore the interrelationship between these three testing artifacts, with the goal of establishing a solid scientific foundation for understanding how they interact. We plan to provide two contributions towards this goal. First, we propose a theoretical framework for discussing testing based on previous work in the theory of testing. Second, we intend to perform a rigorous empirical study controlling for program structure, test coverage criteria, and oracle selection in the domain of safety critical avionics software.Item The Risks of Coverage-Directed Test Case Generation(IEEE, 2015) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, MatsA number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach—where tests are randomly generated until coverage is achieved—contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault ï¬�nding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar—or even higher—levels of fault ï¬�nding). Second, the use of structural coverage as a supplement—rather than a target—for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5% more faults than test suites generated speciï¬�cally to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can—in part—address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.Item Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder(2009) Staats, MattWe present work on a prototype tool based on the JavaPathfinder (JPF) model checker for automatically generating tests satisfying the MC/DC code coverage criterion. Using the Eclipse IDE, developers and testers can quickly instrument Java source code with JPF annotations covering all MC/DC coverage obligations, and JPF can then be used to automatically generate tests that satisfy these obligations. The prototype extension to JPF enables various tasks useful in automatic test generation to be performed, such as test suite reduction and execution of generated tests.