University of Minnesota Software Engineering Center (UMSEC) Publications

Persistent link for this collection

Search within University of Minnesota Software Engineering Center (UMSEC) Publications

Browse

Recent Submissions

Now showing 1 - 20 of 167
  • Item
    Contract discovery from black-box components
    (2018) Sharma, Vaibhav; Byun, Taejoon; McCamant, Stephen; Rayadurgam, Sanjai; Heimdahl, Mats
    Complex computer-controlled systems are commonly constructed in a middle-out fashion where existing subsystems and available components have a significant influence on system architecture and drive design decisions. During system design, the architect must verify that the components, put together as specified in the architecture, will achieve the desired system behavior. This typically leads to further design modifications or adjustments to requirements triggering another iteration of the design-verify cycle. For software components that are acquired from third-parties, often the only definitive source of information about the component's system-relevant behavior -- its contract -- is its object code. We posit that existing static and dynamic analysis techniques can be used to discover contracts that can help the system designer and specifically discuss how symbolic execution of object code may be particularly well-suited for this purpose.
  • Item
    Validity-Guided Synthesis of Reactive Systems from Assume-Guarantee Contracts
    (2018) Katis, Andreas; Fedyukovich, Grigory; Guo, Huajun; Gacek, Andrew; Backes, John; Gurfinkel, Arie; Whalen, Michael
    Automated synthesis of reactive systems from specifications has been a topic of research for decades. Recently, a variety of approaches have been proposed to extend synthesis of reactive systems from propositional specifications towards specifications over rich theories. We propose a novel, completely automated approach to program synthesis which reduces the problem to deciding the validity of a set of AE-formulas. In spirit of IC3 / PDR, our problem space is recursively refined by blocking out regions of unsafe states, aiming to discover a fixpoint that describes safe reactions. If such a fixpoint is found, we construct a witness that is directly translated into an implementation. We implemented the algorithm on top of the JKind model checker, and exercised it against contracts written using the Lustre specification language. Experimental results show how the new algorithm outperforms JKind’s already existing synthesis procedure based on l-induction and addresses soundness issues in the l-inductive approach with respect to unrealizable results.
  • Item
    Challenges in Testing Next Generation CPS Systems
    (2017) Whalen, Michael
    Testing cyber-physical systems presents a unique set of testing challenges: heterogeneity, timing, and, especially, observability. In particular, some of the mechanisms that are designed to make embedded software robust are the same mechanisms that present challenges for automated testing techniques: e.g., rate limiting, fault masking, and debounce logic, which can lead to long lags between problematic inputs and their manifestation in system outputs. In addition, much of the control behavior of CPS is mathematically intensive more than "branchy". For such systems, traditional coverage metrics, which focus on reaching program statements or mutants (as in weak mutation), rather than observing their affect on system outputs, are likely to lead to poor fault-finding in practice. I will describe some recent work in the area of test generation and code metrics that begins to address these issues. I then argue that this mismatch will become significantly worse for when examining CPS systems that incorporate machine learning: the complexity of the non-linear mathematics used defeats all known symbolic techniques and the lack of a traditional branching structure renders traditional code coverage metrics fairly meaningless when using search-based testing. In fact, a recent paper abstract claims: "We propose a novel control flow obfuscation design based on the incomprehensibility of artificial neural networks to fight against reverse engineering tools including concolic testing." Traditional testing of such systems has focused on checking their behavior against a large corpus of examples; unfortunately, it is known that such systems are often not robust against malicious inputs even after extensive training and testing. I hope to challenge participants to focus research into such mathematically intensive systems and to begin to create rigorous metrics that are applicable to such systems.
  • Item
    Discovering Instructions for Robust Binary-level Coverage Criteria
    (2017) Sharma, Vaibhav; Byun, Taejoon; McCamant, Stephen; Rayadurgam, Sanjai; Heimdahl, Mats
    Object-Branch Coverage (OBC) is often used to measure e ective- ness of test suites, when source code is unavailable. The traditional OBC de nition can be made more resilient to variations in compil- ers and the structure of generated code by creating more robust de nitions. However nding which instructions should be included in each new de nition is laborious, error-prone, and architecture- dependent. We automate the discovery of instructions to be in- cluded for an improved OBC de nition on the X86 and ARM archi- tectures. We discover all possible valid instructions by symbolically executing instruction decoders for X86 and ARM instructions. For each discovered instruction, we translate it to Vine IR, and check if the Vine IR translation satis es the OBC de nition. We verify the correctness of our tool by comparing its output with the X86 and ARM architecture manuals. Our automated instruction clas- si cation facilitates development of more robust OBC de nitions with better bug- nding ability and lesser sensitivity to compiler variations.
  • Item
    Domain Modeling for Development Process Simulation
    (ACM, 2017) De Silva, Ian; Rayadurgam, Sanjai; Heimdahl, Mats
    Simulating agile processes prior to adoption can reduce the risk of enacting an ill-fitting process. Agent-based simulation is well-suited to capture the individual decision-making valued in agile. Yet, agile's lightweight nature creates simulation difficulties as agents must fill-in gaps within the specified process. Deliberative agents can do this given a suitable planning domain model. However, no such model, nor guidance for creating one, currently exists. In this work, we propose a means for constructing an agile planning domain model suitable for agent-based simulation. As such, the domain model must ensure that all activity sequences derived from the model are executable by a software agent. We prescribe iterative elaboration and decomposition of an existing process to generate successive internally-complete and -consistent domain models, thereby ensuring plans derived from the model are valid. We then demonstrate how to generate a domain model and exemplify its use in planning the actions of a single agent.
  • Item
    Towards Synthesis from Assume-Guarantee Contracts involving Infinite Theories: A Preliminary Report
    (ACM, 2016) Katis, Andreas; Whalen, Michael; Gacek, Andrew
    In previous work, we have introduced a contract-based {em realizability checking} algorithm for assume-guarantee contracts involving infinite theories, such as linear integer/real arithmetic and uninterpreted functions over infinite domains. This algorithm can determine whether or not it is possible to construct a realization (i.e. an implementation) of an assume-guarantee contract. The algorithm is similar to k-induction model checking, but involves the use of quantifiers to determine implementability. While our work on realizability is inherently useful for virtual integration in determining whether it is possible for suppliers to build software that meets a contract, it also provides the foundations to solving the more challenging problem of component synthesis. In this paper, we provide an initial synthesis algorithm for assume-guarantee contracts involving infinite theories. To do so, we take advantage of our realizability checking procedure and a skolemization solver for forall/exists-formulas, called AE-VAL. We show that it is possible to immediately adapt our existing algorithm towards synthesis by using this solver, using a demonstration example. We then discuss challenges towards creating a more robust synthesis algorithm.
  • Item
    Requirements and Architectures for Secure Vehicles
    (IEEE, 2016) Whalen, Michael; Cofer, Darren; Gacek, Andrew
    In the High-Assurance Cyber Military Systems project, researchers are investigating how to construct complex networked-vehicle software securely. Experiments demonstrated that careful attention to requirements and system architecture, along with formally verified approaches that remove known security weaknesses, can lead to vehicles that can withstand attacks from even sophisticated attackers with access to vehicle design data.
  • Item
    Design Considerations for Modeling Modes in Cyber-Physical Systems
    (2015) Murugesan, Anitha; Rayadurgam, Sanjai; Whalen, Michael; Heimdahl, Mats
    Safety critical systems such as cruise control in automotive systems and variable rate bolus in medical device infusion pumps introduce complexity and reduce the flexibility of incremental code modifications. This paper proposes a generic pattern to structure the mode logic such that additions, modifications, and removal of behaviors could be done in a quick and localized fashion without losing model integrity. The authors illustrate the proposed pattern using the infusion pump as a case study and describe a design pattern for the mode logic of reactive systems that allows for flexible, understandable, and maintainable models.
  • Item
    The Risks of Coverage-Directed Test Case Generation
    (IEEE, 2015) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, Mats
    A number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach—where tests are randomly generated until coverage is achieved—contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault �nding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar—or even higher—levels of fault �nding). Second, the use of structural coverage as a supplement—rather than a target—for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5% more faults than test suites generated speci�cally to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can—in part—address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.
  • Item
    Resolute: An Assurance Case Language for Architecture Models
    (ACM, 2016) Gacek, Andrew; Backes, John; Cofer, Darren; Slind, Konrad; Whalen, Michael
    Arguments about the safety, security, and correctness of a complex system are often made in the form of an assurance case. An assurance case is a structured argument, often represented with a graphical interface, that presents and supports claims about a system's behavior. The argument may combine different kinds of evidence to justify its top level claim. While assurance cases deliver some level of guarantee of a system's correctness, they lack the rigor that proofs from formal methods typically provide. Furthermore, changes in the structure of a model during development may result in inconsistencies between a design and its assurance case. Our solution is a framework for automatically generating assurance cases based on 1) a system model specified in an architectural design language, 2) a set of logical rules expressed in a domain specific language that we have developed, and 3) the results of other formal analyses that have been run on the model. We argue that the rigor of these automatically generated assurance cases exceeds those of traditional assurance case arguments because of their more formal logical foundation and direct connection to the architectural model.
  • Item
    Architecture Modeling and Analysis for Safety Engineering
    (Springer, 2017) Stewart, Danielle; Whalen, Michael; Cofer, Darren; Heimdahl, Mats
    Architecture description languages such as AADL allow systems engineers to specify the structure of system architectures and perform several analyses over them, including schedulability, resource analysis, and information flow. In addition, they permit system-level requirements to be specified and analyzed early in the development process of airborne and ground-based systems. These tools can also be used to perform safety analysis based on the system architecture and initial functional decomposition. Using AADL-based system architecture modeling and analysis tools as an exemplar, we extend existing analysis methods to support system safety objectives of ARP4754A and ARP4761. This includes extensions to existing modeling languages to better describe failure conditions, interactions, and mitigations, and improvements to compositional reasoning approaches focused on the specific needs of system safety analysis. We develop example systems based on the Wheel Braking System in SAE AIR6110 to evaluate the effectiveness and practicality of our approach.
  • Item
    A Hierarchical Requirements Reference Model
    (2014) Murugesan, Anitha
    Requirements reference models provide a conceptual framework for discussing and reasoning about system development artifacts such as requirements, assumptions and designs. In practice, complex systems are naturally constructed in hierarchies in which design choices at one level of abstraction influence the requirements that flow down to the subsequent lower levels. Well-known reference models such as Parnas’ Four-variable model, Jackson’s World-Machine model, and the WRSPM model of Gunter et al., can be seen as capturing on view of this decomposition—the one in which the solution (software or machine) emerges as a single component. This view cannot uniformly accommodate the requirements flow-down in the multi-component, multi-level hierarchies of complex systems. This paper introduces a hierarchical requirements reference model that provides a conceptual framework to represent and reason about the artifacts of a hierarchically composed system. This new model provides a simplified and consistent structuring mechanism for representing all layers of the system hierarchy. Formal compositional verification of a hierarchically organized medical device controller is used as case example to illustrate the application of the reference model.
  • Item
    Executing Model-based Tests on Platform-specific Implementations
    (IEEE, 2015) You, Dongjiang; Rayadurgam, Sanjai; Heimdahl, Mats; Komp, John; Kim, Baek-Gyu; Sokolsky, Oleg
    Model-based testing of embedded real-time systems is challenging because platform-specific details are often abstracted away to make the models amenable to various analyses. Testing an implementation to expose non-conformance to such a model requires reconciling differences arising from these abstractions. Due to stateful behavior, naive comparisons of model and system behaviors often fail causing numerous false positives. Previously proposed approaches address this by being reactively permissive: passing criteria are relaxed to reduce false positives, but may increase false negatives, which is particularly bothersome for safety-critical systems. To address this concern, we propose an automated approach that is proactively adaptive: test stimuli and system responses are suitably modified taking into account platform-specific aspects so that the modified test when executed on the platform-specific implementation exercises the intended scenario captured in the original model-based test. We show that the new framework eliminates false negatives while keeping the number of false positives low for a variety of platform-specific configurations.
  • Item
    Steering Model-Based Oracles to Admit Real Program Behaviors
    (2014) Gay, Gregory; Rayadurgam, Sanjai; Heimdahl, Mats
    The oracle - an arbiter of correctness of the system under test (SUT) - is a major component of the testing process. Specifying oracles is challenging for real-time embedded systems, where small changes in time or sensor inputs may cause large differences in behavior. Behavioral models of such systems, often built for analysis and simulation, are appealing for reuse as oracles. However, these models typically provide an idealized view of the system. Even when given the same inputs, the model’s behavior can frequently be at variance with an acceptable behavior of the SUT executing on a real platform. We therefore propose steering the model when used as an oracle, to admit an expanded set of behaviors when judging the SUT’s adherence to its requirements. On detecting a behavioral difference, the model is backtracked and then searched for a new state that satis�es certain constraints and minimizes a dissimilarity metric. The goal is to allow non-deterministic, but bounded, behavior differences while preventing future mismatches, by guiding the oracle - within limits - to match the execution of the SUT. Early results show that steering signi�cantly increases SUT-oracle conformance with minimal masking of real faults and, thus, has signi�cant potential for reducing development costs.
  • Item
    Efficient Observability-based Test Generation by Dynamic Symbolic Execution
    (IEEE, 2015) You, Dongjiang; Rayadurgam, Sanjai; Whalen, Michael; Heimdahl, Mats; Gay, Gregory
    Structural coverage metrics have been widely used to measure test suite adequacy as well as to generate test cases. In previous investigations, we have found that the fault-finding effectiveness of tests satisfying structural coverage criteria is highly dependent on program syntax – even if the faulty code is exercised, its effect may not be observable at the output. To address these problems, observability-based coverage metrics have been defined. Specifically, Observable MC/DC (OMC/DC) is a criterion that appears to be both more effective at detecting faults and more robust to program restructuring than MC/DC. Traditional counterexample-based test generation for OMC/DC, however, can be infeasible on large systems. In this study, we propose an incremental test generation approach that combines the notion of observability with dynamic symbolic execution. We evaluated the efficiency and effectiveness of our approach using seven systems from the avionics and medical device domains. Our results show that the incremental approach requires much lower generation time, while achieving even higher fault finding effectiveness compared with regular OMC/DC generation.
  • Item
    Helping System Engineers Bridge the Peaks
    (ACM, 2014) Rungta, Neha; Person, Suzette; Biatek, Jason; Whalen, Michael; Castle, Joseph; Gundy-Burlet, Karen
    In our experience at NASA, system engineers generally follow the Twin Peaks approach when developing safety-critical systems. However, iterations between the peaks require considerable manual, and in some cases duplicate, effort. A significant part of the manual effort stems from the fact that requirements are written in English natural language rather than a formal notation. In this work, we propose an approach that enables system engineers to leverage formal requirements and automated test generation to streamline iterations, effectively "bridging the peaks". The key to the approach is a formal language notation that a) system engineers are comfortable with, b) is supported by a family of automated V&V tools, and c) is semantically rich enough to describe the requirements of interest. We believe the combination of formalizing requirements and providing tool support to automate the iterations will lead to a more effcient Twin Peaks implementation at NASA.
  • Item
    Improving the Accuracy of Oracle Verdicts Through Automated Model Steering
    (2014) Gay, Gregory; Rayadurgam, Sanjai; Heimdahl, Mats
    The oracle--a judge of the correctness of the system under test (SUT)--is a major component of the testing process. Specifying test oracles is challenging for some domains, such as real-time embedded systems, where small changes in timing or sensory input may cause large behavioral differences. Models of such systems, often built for analysis and simulation, are appealing for reuse as oracles. These models, however, typically represent an idealized system, abstracting away certain issues such as non-deterministic timing behavior and sensor noise. Thus, even with the same inputs, the model's behavior may fail to match an acceptable behavior of the SUT, leading to many false positives reported by the oracle. We propose an automated steering framework that can adjust the behavior of the model to better match the behavior of the SUT to reduce the rate of false positives. This model steering is limited by a set of constraints (defining acceptable differences in behavior) and is based on a search process attempting to minimize a dissimilarity metric. This framework allows non-deterministic, but bounded, behavior differences, while preventing future mismatches, by guiding the oracle--within limits--to match the execution of the SUT. Results show that steering significantly increases SUT-oracle conformance with minimal masking of real faults and, thus, has significant potential for reducing false positives and, consequently, development costs.
  • Item
    Automated Oracle Data Selection Support
    (IEEE, 2015-06) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, Mats
    The choice of test oracle—the artifact that determines whether an application under test executes correctly—can significantly impact the effectiveness of the testing process. However, despite the prevalence of tools that support test input selection, little work exists for supporting oracle creation. We propose a method of supporting test oracle creation that automatically selects the oracle data—the set of variables monitored during testing—for expected value test oracles. This approach is based on the use of mutation analysis to rank variables in terms of fault-finding effectiveness, thus automating the selection of the oracle data. Experimental results obtained by employing our method over six industrial systems (while varying test input types and the number of generated mutants) indicate that our method—when paired with test inputs generated either at random or to satisfy specific structural coverage criteria—may be a cost-effective approach for producing small, effective oracle data sets, with fault finding improvements over current industrial best practice of up to 1,435% observed (with typical improvements of up to 50%).
  • Item
    Moving the Goalposts: Coverage Satisfaction is Not Enough
    (ACM, 2014) Gay, Gregory; Staats, Matt; Whalen, Michael; Heimdahl, Mats
    Structural coverage criteria have been proposed to measure the adequacy of testing efforts. Indeed, in some domains--e.g., critical systems areas--structural coverage criteria must be satisfied to achieve certification. The advent of powerful search-based test generation tools has given us the ability to generate test inputs to satisfy these structural coverage criteria. While tempting, recent empirical evidence indicates these tools should be used with caution, as merely achieving high structural coverage is not necessarily indicative of high fault detection ability. In this report, we review some of these findings, and offer recommendations on how the strengths of search-based test generation methods can alleviate these issues.
  • Item
    Hierarchical Multi-Formalism Proofs of Cyber-Physical Systems
    (University of Minnesota, 2015) Whalen, Michael; Rayadurgam, Sanjai; Ghassabani, Elaheh; Murugesan, Anitha; Sokolsky, Oleg; Heimdahl, Mats; Lee, Insup
    To manage design complexity and provide verification tractability, models of complex cyber-physical systems are typically hierarchically organized into multiple abstraction layers. High-level analysis explores interactions of the system with its physical environment, while embedded software is developed separately based on derived requirements. This separation of lowlevel and high-level analysis also gives hope to scalability, because we are able to use tools that are appropriate for each level. When attempting to perform compositional reasoning in such an environment, care must be taken to ensure that results from one tool can be used in another to avoid errors due to “mismatches� in the semantics of the underlying formalisms. This paper proposes a formal approach for linking high-level continuous time models and lower-level discrete time models. Specifically, we lift a discrete-time controller specified using synchronous observer properties into continuous time for proof using timed automata (UPPAAL). To define semantic compatibility between the models, we propose a direct semantics for a network of timed automata with a discrete-time component called Contract- Extended Network of Timed Automata (CENTA) and examine semantic issues involving timing and events with the combination. We then propose a translation of the discrete-time controller into a timed automata state machine and show the equivalence of the translation with the CENTA formulation. We demonstrate the usefulness of the approach by proving that a complex medical infusion pump controller is safe with respect to a continuous time clinical scenario.