Browsing by Subject "Computer Science"
Now showing 1 - 20 of 88
Results Per Page
Sort Options
Item Abstract Syntax Contextualization Framework for Debugging Attribute Grammar Specifications(2024) Feraru, Matthew;In this thesis, we explore an aspect of debugging attribute grammar (AG) specifications. AG frameworks in themselves are high-level languages that allow a programmer to specify the syntax rules and semantics of a new programming language. The debugging of AG specifications is often done by interactively traversing abstract syntax trees (ASTs) that represent a parsed program in a metaprogram. The goal of such debugging is to find AG specifications with semantic rules that observe correct inputs but incorrect outputs—the possible bugs of AG specifications we consider. For large programs, ASTs may be difficult to understand by a programmer; graphically rendering ASTs in a debugging interface is challenging and still does not make it straightforwardly easy to understand ASTs relative to source code. Resultantly, we propose a textual way to use source and source-like syntax to represent the location of a navigated-to AST node relative to its position in an entire AST and highlight any notable features of the tree, such as the application of rewrite rules. This contextualization framework of abstract syntax tree nodes has been prototyped to work on Silver [1] specifications, but it is applicable to any AG framework since it only relies on the core features of the AG paradigm itself.Item Algorithms for mining evolving patterns in dynamic relational networks(2014-09) Ahmed, RezwanDynamic networks have recently being recognized as a powerful abstraction to model and represent the temporal changes and dynamic aspects of the data underlying many complex systems. This recognition has resulted in a burst of research activity related to modeling, analyzing, and understanding the properties, characteristics, and evolution of such dynamic networks. The focus of this growing research has been mainly defining important recurrent structural patterns and developing algorithms for their identification. Most of these tools are not designed to identify time-persistent relational patterns or do not focus on tracking the changes of these relational patterns over time. Analysis of temporal aspects of the entity relations in these networks can provide significant insight in determining the conserved relational patterns and the evolution of such patterns over time. Computational methods and tools that can efficiently and effectively analyze the changes in dynamic relational networks can substantially expand the types and diversity of dynamic networks that can be analyzed and the information that can be gained from such analysis. This may provide information on the recurrence and the stability of its relational patterns, help in the detection of abnormal patterns potentially indicating fraudulent or other malevolent behaviors, and improve the ability to predict the relations and their changes in these networks. In this dissertation we present new data mining methods for analyzing the temporal evolution of relations between entities of relational networks. Different classes of evolving relational patterns are introduced that are motivated by considering two distinct aspects of relational pattern evolution. The first is the notion of state transition and seeks to identify sets of entities whose time-persistent relations change over time and space. The second is the notion of coevolution and seeks to identify recurring sets of entities whose relations change in a consistent way over time and space. We first present a new class of patterns, referred as the evolving induced relational states (EIRS), which is designed to analyze the time-persistent relations or states between the entities of the dynamic networks. These patterns can help identify the transitions from one conserved state to the next and may provide evidence to the existence of external factors that are responsible for changing the stable relational patterns in these networks. We developed an algorithm to efficiently mine all maximal non-redundant evolution paths of the stable relational states of a dynamic network. Next we introduce a class of patterns, referred to as coevolving relational motifs (CRM), which is designed to identify recurring sets of entities whose relations change in a consistent way over time. CRMs can provide evidence to the existence of, possibly unknown, coordination mechanisms by identifying the relational motifs that evolve in a similar and highly conserved fashion. An algorithm is presented to efficiently analyze the frequent relational changes between the entities of the dynamic networks and capture all frequent coevolutions as CRMs. At last, we define a new class of patterns built upon the concepts of CRMs, referred as coevolving induced relational motifs (CIRM), is designed to represent patterns in which all the relations among recurring sets of nodes are captured and some of the relations undergo changes in a consistent way across different snapshots of the network. We also present an algorithm to efficiently mine all frequent coevolving induced relational motifs.A comprehensive evaluation of the performance and scalability of all the algorithms is presented through extensive experiments using multiple dynamic networks derived from real world datasets from various application domains. The detailed analysis of the results from the experiments illustrate the efficiency of these algorithms. In addition, we provide a qualitative analysis of the information captured by each class of the discovered patterns. For example, we show that some of the discovered CRMs capture relational changes between the nodes that are thematically different (i.e., edge labels transition between two clusters of topics that have very low similarity). Moreover, some of these patterns are able to capture network characteristics that can be used as features for modeling the underlying dynamic network.The new classes of evolving patterns and the algorithms can enable the effective analysis of complex relational networks towards the goal of better understanding of their temporal changes. Knowing these patterns provides strong observational evidence of the existence of mechanisms that control, coordinate, and trigger these evolutionary changes, which can be used to focus further studies and analysis.Item Anomaly detection for symbolic sequences and time series data(2009-09) Chandola, VarunThis thesis deals with the problem of anomaly detection for sequence data. Anomaly detection has been a widely researched problem in several application domains such as system health management, intrusion detection, health-care, bio-informatics, fraud detection, and mechanical fault detection. Traditional anomaly detection techniques analyze each data instance (as a univariate or multivariate record) independently, and ignore the sequential aspect of the data. Often, anomalies in sequences can be detected only by analyzing data instances together as a sequence, and hence cannot detected by traditional anomaly detection techniques. The problem of anomaly detection for sequence data is a rich area of research because of two main reasons. First, sequences can be of different types, e.g., symbolic sequences, time series data, etc., and each type of sequence poses unique set of problems. Second, anomalies in sequences can be defined in multiple ways and hence there are different problem formulations. In this thesis we focus on solving one particular problem formulation called semi-supervised anomaly detection. We study the problem separately for symbolic sequences, univariate time series data, and multivariate time series data. The state of art on anomaly detection for sequences is limited and fragmented across application domains. For symbolic sequences, several techniques have been proposed within specific domains, but it is not well-understood as to how a technique developed for one domain would perform in a completely different domain. For univariate time series data, limited techniques exist, and are only evaluated for specific domains, while for multivariate time series data, anomaly detection research is relatively untouched. This thesis has two key goals. First goal is to develop novel anomaly detection techniques for different types of sequences which perform better than existing techniques across a variety of application domains. The second goal is to identify the best anomaly detection technique for a given application domain. By realizing the first goal we develop a suite of anomaly detection techniques for a domain scientist to choose from, while the second goal will help the scientist to choose the technique best suited for the task. To achieve the first goal we develop several novel anomaly detection techniques for univariate symbolic sequences, univariate time series data, and multivariate time series data. We provide extensive experimental evaluation of the proposed techniques on data sets collected across diverse domains and generated from data generators, also developed as part of this thesis. We show how the proposed techniques can be used to detect anomalies which translate to critical events in domains such as aircraft safety, intrusion detection, and patient health management. The techniques proposed in this thesis are shown to outperform existing techniques on many data sets. The technique proposed for multivariate time series data is one of the very first anomaly detection technique that can detect complex anomalies in such data. To achieve the second goal, we study the relationship between anomaly detection techniques and the nature of the data on which they are applied. A novel analysis framework, Reference Based Analysis (RBA), is proposed that can map a given data set (of any type) into a multivariate continuous space with respect to a reference data set. We apply the RBA framework to not only visualize and understand complex data types, such as multivariate categorical data and symbolic sequence data, but also to extract data driven features from symbolic sequences, which when used with traditional anomaly detection techniques are shown to consistently outperform the state of art anomaly detection techniques for these complex data types. Two novel techniques for symbolic sequences are proposed using the RBA framework which perform better than the best technique for each different data set.Item Application, rendering and display of automotive paint.(2009-11) Konieczny, Jonathan M.New computer graphics tools are developed for use in the automotive paint design and paint application industries. These tools are designed to aid in every aspect of automotive painting from initial paint design all the way to viewing a final spray paint job. This thesis also provides new computer graphics techniques by leveraging industrial appearance standards and measurement instruments to yield new interaction, rendering and display algorithms. First, a system is introduced for the simulation of spray painting. Head mounted display goggles are combined with a tracking system to allow users to paint a virtual surface with a spray gun. Ray tracing is used to simulate droplets landing on the surface of the object, allowing arbitrary shapes and spray gun patterns to be used. This system is combined with previous research on spray gun characteristics to provide a realistic simulation of the spray paint including the effects of viscosity, air pressure, and paint pressure. Experiments were performed to validate the system for use as a training tool. Next, a virtual airbrush tool is introduced. The basic particle simulation used in the spray paint system is modified to include the finer control needed for airbrushing. Paint mixing between colors applied to the surface is modeled using Kubelka-Munk theory. Computerized stencils, including semi-permeable stencils, can be manually positioned by the artist or projected onto the object’s surface. The resulting airbrush system can be used by airbrush artists to both practice their craft as well as to decorate virtual models. The dissertation then utilizes an industrial measurement instrument to simulate the surface roughness in automotive paint finishes. This simulation is integrated with a design interface to enable industry professionals to devise new paints that have detailed surface effects. The industrial measurement device can also be used to rapidly measure and render real world materials without the need for elaborate full BRDF acquisition tools. Finally, the surface model developed in this research can be used to study human detection of small scale surface roughness. Lastly, new projection systems are introduced to display the paints generated by the previous algorithms. A desired paint is projected onto the surface of an object, making that object appear to be painted with that material. This allows painters to evaluate how the final painted surface will look in a very natural way. A novel projection system is also described in which the user can hold and evaluate a flexible sample of virtual paint. This provides a more compelling display of the paint than a standard 2D monitor can generate. These display methods can also be used to display materials other than paint, potentially benefiting a large range of industries. The combination of these tools is intended to allow spray painters to design paints, paint them onto an automobile or other object, and verify exactly what the result will look like before ever having to manufacture the paint itself. The use of these tools could therefore reduce wasted paint, speed up training times, and allow for precise design of automobile appearance: all before ever manufacturing any real paint. In addition to aiding paint industries, these tools enhance the current state of the art in computer graphics. The airbrush tool provides a new texture creation system that existing airbrush artists can use with little to no training; the surface roughness simulation provides enhanced rendering of automotive paint that could be used in movies and games; and the projection display system improves the state of the art in augmented reality systems.Item An approach for oracle data selection criterion.(2010-09) Park, Myung-HwanTesting activities involve the execution of a program under test using input data and the examination of the test results to determine success or failure. One of most important tasks in examining test results is to choose the variables to observe, called oracle data, that are used in the examination. Since it is often infeasible to examine all variables of the program under test, we have to choose a subset of the program variables as oracle data. If we can choose variables which can reveal more errors than others, it can significantly increase test effectiveness. A challenge of selecting oracle data is to predict the capability of variables to reveal errors when examined. The work in this dissertation addresses the problem of choosing oracle data to increase test effectiveness. To predict the error finding capability of variables, we focus on the error propagation behavior of a system. An error in a system can propagate to other variables if those variables are computationally related to the error. Sometimes, however, the error can be masked out during computation. To analyze such error propagation behavior, we propose a novel mechanism of error propagation analysis which estimates error propagation behavior through a static analysis of the code. The error propagation analysis computes the error propagation probability for each variable, and this quantitative measure represents the error finding capability of variables. The second contribution of this work is to establish an oracle data selection criterion. In a naive approach, we may simply pick the variables with the highest error finding capability. This, however, ignored the possibility that several variables may reveal the same error making the selection suboptimal. Our criterion enables us to choose oracle data to increase test effectiveness while the repeated detections of an error are minimized. The strength of our oracle data selection criterion is its ability to choose oracle data that is effective with a small number of variables in the set of oracle data. To evaluate the effectiveness of our oracle data selection criterion, we developed an error propagation analysis tool that ranks system variables based on their error finding capabilities. We performed experiments on four sample systems to check the error finding effectiveness of our oracle data selection criterion. The experiment shows promising results in terms of error finding effectiveness. In addition, we investigated the sensitivity of our oracle data selection criterion to changes in the assumptions underlying the approach, and the results indicate that our technique is not very sensitive to even extremely skewed assumptions.Item An Assessment of the Writing of Undergraduate Computer Science Students(University of Minnesota, 2002) Nurkkala, Tom; Gini, MariaItem Automated virtual treatment planning in orthodontics: modeling and algorithms.(2012-07) Kumar, YokeshComputer-based virtual treatment planning and simulation has become increasingly important in orthodontics due to its potential to lower costs and provide better treatment outcomes. However, its clinical use is currently limited by the need for significant human intervention in segmenting the dental arch (a 3-dimensional surface mesh) into individual tooth objects and then (re)aligning these tooth objects on the arch to satisfy prescribed treatment criteria. The goals of this thesis are to develop modeling techniques, computational algorithms, and associated software to automate key components of the treatment process, including dental arch segmentation, feature identification, and tooth alignment, thereby making the advantages of virtual treatment planning and simulation available to the vast majority of orthodontic patients. Towards this end, this thesis makes the following contributions: First, an algorithm is designed to segment a dental arch into individual tooth objects (sub-meshes) that can be manipulated downstream in the treatment planning and simulation pipeline. The algorithm is largely automated and requires human intervention in very difficult or unusual cases only. The algorithm avoids the limitations of known approaches to dental segmentation by dividing the segmentation task into two key subtasks—separation of the gums from the teeth and separation of the teeth from one another—and also incorporates a new technique to repair defects in the gumline caused by noise in the data. Second, algorithms are developed to automatically identify suitable features on the surfaces of individual tooth objects for use in the alignment stage. These include intrinsic features such as cusps, incisal edges, grooves, marginal ridges, and the occlusal surface boundary, as well as derived features such as the archform and occlusal plane. A key technique underlying some of these algorithms is the identification and clustering of vertices of high curvature on certain planar cross-sections of the tooth objects and stitching these clusters together to extract the features of interest. Third, an algorithm is developed to automatically align the tooth objects on two opposing arches so that the best possible intra-arch and inter-arch occlusion (i.e., contact relationship) of teeth is achieved, while respecting certain natural dental constraints. The alignment process is modeled as a simulation-based optimization of certain configurations of constraints defined with respect to the tooth features identified previously. The simulation is based on a spring-mass model where the teeth gradually move to their final positions under the influence of forces exerted by (hypothetical) springs attached to the features. Finally, all of the algorithms developed in this thesis have been implemented and incorporated into a comprehensive software tool. This tool has been used by dental practitioners to evaluate and validate the algorithms on clinical data. These experimental studies have demonstrated the efficacy of the algorithms for orthodontic treatment planning.Item Calibration and component placement in structured light systems for 3D reconstruction tasks.(2009-10) Bird, Nathaniel DavisThis thesis examines the amount of detail in 3D scene reconstruction that can be extracted using structured-light camera and projector based systems. Structured light systems are similar to multi-camera stereoscopic systems, except that a projector is use in place of at least one camera. This aids 3D scene reconstruction by greatly simplifying the correspondence problem, i.e., identifying the same world point in multiple images. The motivation for this work comes from problems involved with the helical tomotherapy device in use at the University of Minnesota. This device performs conformal radiation therapy, delivering high radiation dosage to certain patient body areas, but lower dosage elsewhere. The device currently has no feedback as to the patient's body positioning, and vision-based methods are promising. The tolerances for such tracking are very tight, requiring methods that maximize the quality of reconstruction through good element placement and calibration. Optimal placement of cameras and projectors for specific detection tasks is examined, and a mathematical basis for judging the quality of camera and projector placement is derived. Two competing interests are taken into account for these quality measures: the overall visibility for the volume of interest, i.e., how much of a target object is visible; and the scale of visibility for the volume of interest, i.e., how precisely points can be detected. Optimal calibration of camera and projector systems is examined as well. Calibration is important as poor calibration will ultimately lead to a poor quality reconstruction. This is a difficult problem because projected patterns do not conform to any set geometric constraints when projected onto general scenes. Such constraints are often necessary for calibration. However, it can be shown that an optimal image-based calibration can be found for camera and projector systems if there are at least two cameras whose views overlap that of the projector. The overall quality of scene reconstruction from structured light systems is a complex problem. The work in this thesis analyzes this problem from multiple directions and provides methods and solutions that can be applied to real-world systems.Item Categorization of Warning Statements for Over-the-Counter Medications(2016-05) Li, YutongThe use of dietary supplements (including vitamin, minerals, fiber acids, etc.), a kind of over-the-counter (OTC) drugs, by U.S. adults is significantly increasing nowadays. However, the supplements can cause side effects and interactions with conventional prescriptions. The DailyMed provides OTC drug labels containing black box warnings. The information embedded in narrative is not directly searchable, thus preventing its wide use by clinical decision support system and consumers. The objective of this research is to develop an automatic program to categorize (e.g., interactions, side effects) warning statements from drug labels. After developing the patterns to extract safety information, we can categorize warning statements with precision of 0.948 and recall of 0.918.Item A computational approach to the study of player and group performance in massively multiplayer online games.(2011-12) Shim, Kyong JinThe market for video games skyrocketed over the past decade. Massively Multiplayer Online Games (MMOGs) have become increasingly popular and have communities comprising over 47 million subscribers by the year 2008. With their increasing popularity, researchers are realizing that video games can be a means to fully observe an entire isolated universe. Each action is logged, and the level of granularity and completeness with which information is collected is unmatched by any real-life experimental setup. They serve as unprecedented tools to theorize and empirically model the social and behavioral dynamics of individuals, groups, and networks within large communities. Virtual world applications usually have a thin-client architecture, practically all user actions are captured in the click-stream logged at the server. This dataset contains a comprehensive record of every user's in-network activities, accomplishments, interactions, economic status, etc. A brief record of the user's side information (i.e. profile data) is also stored. It is common for popular social networking and collaborative systems to have hundreds of thousands of users generating copious amounts of data based on the many different activities they are participating in at any given time. The data also has a temporal component which is often an integral part of the analysis and introduces further relationships that must be accounted for. Thus, while providing an exciting new tool for the social sciences, the virtual worlds also present a set of difficult and novel computational challenges. In the gaming community today, there is a growing interest in understanding player behaviors both inside and outside the gaming space. Game companies are interested in finding out how their games are played, if they are being played as intended, how the different game mechanics are being played out and how the different game playing patterns lead to a high level of satisfaction and entertainment for customers. Retrospective analyses after the game launch on existing game features can reveal information on which features enhance player's gaming experience and to which demographic segments they especially appeal to. Features negatively correlated with gaming experience can be considered for removal while those positively correlated with gaming experience can be further enhanced. For new game features, prospective analyses before the game launch can reveal information on which features might appeal to certain player population segments with a certain level of confidence and user-oriented testing can focus on these features for further validation. This thesis work presents the first comprehensive quantitative analysis of an important aspect of MMOG game play, namely player and group performance. While there are many different game genres (i.e. action, shooter, action-adventure, adventure, role-playing, and simulation) and many dimensions comprising players' game-play experience, in certain game genres such as MMOGs, close connection has been reported between player enjoyment and completing challenges and mastering skills. A systematic study of individual game player characteristics, group composition and characteristics, social interactions amongst the group members, and game environments can reveal a great deal about what are the recipes for success in achieving various objectives in the game. Broadly, this thesis work seeks to develop 1) Player performance metrics and prediction models, 2) Player activity prediction model, 3) Player enjoyment prediction model, and 4) Group performance metrics and prediction models. Lastly, we contribute a single, generic framework for player and group behavior analysis that is applicable to other MMOGs with minimal configuration changes.Item Computational trust in Multiplayer Online Games.(2012-04) Ahmad, Muhammad AurangzebTrust is a ubiquitous phenomenon in human societies. Computational trust refers to the mediation of trust via a computational infrastructure. It has been studied in a variety of contexts e.g., peer-to-peer systems, multi-agent systems, recommendation systems etc. While this is an active area of research, the types of questions that have been explored in this field has been limited mainly because of limitations in the types of datasets which are available to researchers. In this thesis questions related to trust in complex social environments represented by Massively Multiplayer Online Games (MMOGs) are explored. The main emphasis is that trust is a multi-level phenomenon both in terms of how it operates at multiple levels of network granularities and how trust relates to other social phenomenon like homophily, expertise, mentoring, clandestine behaviors etc. Social contexts and social environments affect not just the qualitative aspects of trust but this phenomenon is also manifested with respect to the network and structural signatures of trust network. Additionally trust is also explored in the context of predictive tasks: Previously described prediction tasks like link prediction are studied in the context of trust within the context of the link prediction family of problems: Link formation, link breakage, change in links etc. Additionally we define and explore new trust-related prediction problems i.e., trust propensity prediction, trust prediction across networks which can be generalized to the inter-network link prediction problem and success prediction based on using network measures of a person's social capital as a proxy.Item Context-aware scanning and determinism-preserving grammar composition, in theory and practice(2010-07) Schwerdfeger, AugustThis thesis documents several new developments in the theory of parsing, and also the practical value of their implementation in the Copper parser generator. The most widely-used apparatus for syntactic analysis of programming languages consists of a scanner based on deterministic finite automata, built from a set of regular expressions called the lexical syntax, and a separate parser, operating on the output of this scanner, built from a context-free grammar and utilizing the LALR(1) algorithm. While the LALR(1) algorithm has the advantage of guaranteeing a non-ambiguous parse, and the approach of keeping the scanner and parser separate make the compilation process more clean and modular, it is also a brittle approach. The class of grammars that can be parsed with an LALR(1) parser is not closed under grammar composition, and small changes to an LALR(1) grammar can remove the grammar from the LALR(1) class. Also, the separation of scanner and parser prevent the use, in any organized way of parser context to resolve ambiguities in the lexical syntax. One area in which both of these drawbacks pose a particular problem is that of parsing embedded and extensible languages. In particular, it forms one of the last major impediments to the development of an extensible compiler in which language extensions are imported and composed by the end user (programmer) in an analogous manner to the way libraries are presently imported. This is due not only to the problem of the LALR(1) grammar class not being closed under composition, but to the very real possibility that the lexical syntax of two different extensions will clash, making it impossible to construct a scanner without extensive manual resolution of ambiguities, if at all. This thesis presents three innovations that are a large step towards eliminating parsing as an Achilles’ heel in this application. Firstly, it describes a new algorithm of scanning called context-aware scanning, in which the scanner at each scan is made aware of what sorts of tokens are recognized as valid by the parser at that point. By allowing the use of parser context in the scanner to disambiguate, context-aware scanning makes the specification of embedded languages much simpler — instead of specifying a scanner that must explicitly switch “modes” to scan on the different embedded languages, one simply compiles a context-aware scanner from a single lexical specification, which has implicit “modes” to scan properly on each embedded language. Similarly, in an extensible language, this puts a degree of separation between the lexical syntax of different language extensions, so that any clashes of this sort will not be an issue. Secondly, the thesis describes an analysis that operates on grammar extensions of a certain form, and can recognize their membership in a class of extensions that can be composed with each other and still produce a deterministic parser—enabling end-users to compose extensions written by different language developers with this guarantee of determinism. The analysis is made practical by context-aware scanning, which ensures a lack of lexical issues to go along with the lack of syntactic nondeterminism. It is this analysis — the first of its kind — that is the largest step toward realizing the sort of extensible compiler described above, as the writer of each extension can test it independently using the analysis and thus resolve any lexical or syntactic issues with the extension before the end user ever sees it. Thirdly, the thesis describes a corollary of this analysis, which allows extensions that have passed the analysis to be distributed in parse table form and then composed on-the-fly by the end users, with the same guarantee of determinism. Besides expediting the operation of composition, this also enables the use of the analysis in situations where the writer of a language or language extension does not want to release its grammar to the public. Finally, the thesis discusses how these approaches have been implemented and made practical in Copper, including runtime tests, implementations and analyses of real-world grammars and extensions.Item Cooperative Localization: On motion-induced initialization and joint state estimation under communication constraints.(2010-08) Trawny, NikolasTeams of mobile robots are becoming increasingly popular to measure and estimate quantities of interest at spatially distributed locations. They have been used in tasks such as surveillance, search and rescue, and underwater- or space exploration. For these tasks, accurate localization, i.e., determining the position and orientation (pose) of each robot, is a fundamental requirement. Instead of localizing each robot in a team independently, Cooperative Localization (CL) incorporates robot-to-robot observations and jointly estimates all robots' poses, which improves localization accuracy for all team members. However, such joint estimation also creates significant challenges. In particular, initializing a joint estimation algorithm requires knowledge of all robots' poses with respect to a common frame of reference. This initialization is straightforward using GPS or manual measurements, but is difficult in the absence of external references. The second difficulty of CL is that it requires communicating large amounts of data, e.g., the robots' sensor measurements or state estimates. However, transmitting all these quantities is not always feasible, either due to bandwidth or power constraints. This thesis offers novel solutions to the aforementioned problems. In the first part of the thesis, we investigate the problem of CL initialization, using robot-to-robot measurements acquired at different vantage points during robot motion. We focus on the most challenging case of distance-only measurements, and provide algorithms that compute the guaranteed global optimum of a nonlinear weighted Least Squares problem formulation. These techniques exploit recent advances in numeric algebraic geometry and optimization. In the second part, we investigate the problem of CL under communication constraints. To reduce communication bandwidth, we propose using adaptively quantized measurements. We extend existing quantized filtering approaches to batch MAP estimators, and apply these techniques to multi-robot localization. We provide results on optimal threshold selection, as well as optimal bit allocation to efficiently utilize time-varying bandwidth. Our results are validated in simulation and experiments. By providing solutions for two important problems in CL { motion-induced estimator initialization, and estimation under communication constraints { the research presented in this thesis aims to promote use of cooperative mobile robots in challenging real-world applications.Item A Cross-Platform, Adaptable GUI for an Evolving System(2010-04-21) Frenz, JonathanThe Internet has come a long way from where it started decades ago. Today, the variety of tasks that can be accomplished over a network are awe-inspiring. But with this rise of potential comes a necessary rise in the complexity of network communications. It has become harder and harder for network administrators or even individual Internet users to keep track of what is coming in and going out of their computers. Even as it becomes harder to monitor this network traffic, it is becoming increasingly indispensible to have a tool that can track and analyze this traffic to discover any spyware, viruses, or adware that may have infected the computer. The tool being developed accomplishes this task by enabling both individual users as well as network administrators to monitor and analyze network traffic. This tool will be distributed to a wide range of platforms for different purposes. Typically, a piece of software with deployment on multiple platforms requires the developers to create an individual graphical user interface in each language. This is unacceptable, however, for such a dynamic and evolving tool as this one. Instead, an interpreter is created for each platform to generate a GUI from the same simple XML file. A change in user interface can then immediately be delivered across all platforms that the tool is running on simply by editing the single XML file.Item A data collection, storage, and analysis framework for network security.(2007-10) Eilertson, EricAs the number, severity and sophistication of computer network attacks increase network administrators have an increasingly difficult time identifying and cleaning up compromised computers. In this thesis some of the areas where existing tools and techniques are deficient are identified, and possible solutions are proposed and evaluated on synthetic as well as real networks. This thesis has four major contributions. The first is a lightweight semi-stateful network data capture module. The second contribution is a framework for storing and accessing raw packet information as well as meta information, such as network sessions. The third contribution is a set of analysis routines for identifying computer network attacks, and computers that have been successfully compromised. The fourth contribution is a framework for iteratively building and analyzing the communication patterns of networked computers. This allows security analysts and researchers to identify compromised computers, as well as perform forensic analysis to answer questions like What computer compromised this computer? When did the compromise occur? How did the compromise happen? What data was stolen or modified?Item Data dissemination for distributed computing.(2010-02) Kim, JinohLarge-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely-coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. This thesis strives to provide predictability in data access for data-intensive computing in large-scale computational infrastructures. A key requirement for achieving predictability in data access is the ability to estimate network performance for data transfer so that computation tasks can take advantage of the estimation in their deployment or data source selection. This thesis develops a framework called OPEN (Overlay Passive Estimation of Network Performance) for scalable network performance estimation. OPEN provides an estimation of end-to-end accessibility for applications by utilizing past measurements without the use of explicit probing. Unlike existing passive approaches, OPEN is not restricted to pairwise or a single network in utilizing historical information; instead, it shares measurements between nodes without any restrictions. As a result, it achieves n2 estimations by O(n) measurements. In addition, this thesis considers data dissemination in two specific environments. First, we consider a parallel data access environment in which multiple replicated servers can be utilized to download a single data file in parallel. To improve both performance and fault tolerance, we present a new parallel data retrieval algorithm and explore a broad set of resource selection heuristics. Second, we consider collective data access in applications for which group performance is more important than individual performance. In this work, we employ communication makespan as a group performance metric and propose server selection heuristics to maximize collective performance.Item Data mining techniques for enhancing protein function prediction(2010-04) Pandey, GauravProteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the function(s) of the protein are inferred from those of other proteins that are "associated" to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein function prediction. We have also developed a method for incorporating the inter-relationships between functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein function prediction algoriths, which enabled us to improve the quality of predictions made for several functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein function predictionItem Deep QWOP Learning(2017) Wu, Hung-Wei;We apply a deep learning model to the QWOP flash game, which requires control of a ragdoll athlete using only the keys “Q”, “W”, “O”, and “P”. The model is a convolutional neural network trained with Q-learning. By training the model with only raw pixel input, we show that our model is capable of successfully learning a control policy associated with playing QWOP. This model was successfully applied to a non-deterministic control environment in the form of a ragdoll physics flash game.Item Deep Z-Learning(2018-05-12) Bittner, NathanIn this thesis, I present advancements in the theory of Z-learning. In particular, I explicitly define a complete tabular Z-learning algorithm, I provide a number of pragmatic qualifications on how Z-learning should be applied to different problem domains, and I extend Z-learning to non-tabular discrete domains by introducing deep network function-approximation versions of Z-learning that is similar to deep Q-learningItem Detecting Behaviorally Equivalent Functions via Symbolic Execution(2016) Hietala, Kesha;Software bugs are a reality of programming. They can be difficult to identify and resolve, even for the most experienced programmers. Certain bugs may even be impossible to remove because they provide some desired functionality. For this reason, designers of modern security-critical applications must accept the inevitable existence of bugs and find ways to detect and recover from the errors they cause. One approach to error detection involves running multiple implementations of a single program at the same time, on the same input, and comparing the results. Divergence of the behavior of the different implementations indicates the existence of a bug. The question we consider in this paper is how to construct these diverse implementations of security-critical programs in a cost-effective way. The solution we propose is to first find existing diverse function implementations and then use these function implementations as building blocks for diverse program implementations. To find diverse function implementations, we use a technique we call adaptor synthesis to compare arbitrary functions for behavioral equivalence. To account for di↵erences in input argument structure between arbitrary functions we allow for adaptor functions, or adaptors, that convert from one argument structure to another. Using adaptors, the problem of determining whether two arbitrary functions are behaviorally equivalent becomes the problem of synthesizing an adaptor between the two functions that makes their output equivalent on all inputs in a specified domain. Along with presenting our adaptor synthesis technique, we describe an implementation for comparing functions for behavioral equivalence at the binary level on the Linux x86-64 platform using a family of adaptors that allows arithmetic combinations of integer values.