Browsing by Subject "Anomaly Detection"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Anomaly detection for symbolic sequences and time series data(2009-09) Chandola, VarunThis thesis deals with the problem of anomaly detection for sequence data. Anomaly detection has been a widely researched problem in several application domains such as system health management, intrusion detection, health-care, bio-informatics, fraud detection, and mechanical fault detection. Traditional anomaly detection techniques analyze each data instance (as a univariate or multivariate record) independently, and ignore the sequential aspect of the data. Often, anomalies in sequences can be detected only by analyzing data instances together as a sequence, and hence cannot detected by traditional anomaly detection techniques. The problem of anomaly detection for sequence data is a rich area of research because of two main reasons. First, sequences can be of different types, e.g., symbolic sequences, time series data, etc., and each type of sequence poses unique set of problems. Second, anomalies in sequences can be defined in multiple ways and hence there are different problem formulations. In this thesis we focus on solving one particular problem formulation called semi-supervised anomaly detection. We study the problem separately for symbolic sequences, univariate time series data, and multivariate time series data. The state of art on anomaly detection for sequences is limited and fragmented across application domains. For symbolic sequences, several techniques have been proposed within specific domains, but it is not well-understood as to how a technique developed for one domain would perform in a completely different domain. For univariate time series data, limited techniques exist, and are only evaluated for specific domains, while for multivariate time series data, anomaly detection research is relatively untouched. This thesis has two key goals. First goal is to develop novel anomaly detection techniques for different types of sequences which perform better than existing techniques across a variety of application domains. The second goal is to identify the best anomaly detection technique for a given application domain. By realizing the first goal we develop a suite of anomaly detection techniques for a domain scientist to choose from, while the second goal will help the scientist to choose the technique best suited for the task. To achieve the first goal we develop several novel anomaly detection techniques for univariate symbolic sequences, univariate time series data, and multivariate time series data. We provide extensive experimental evaluation of the proposed techniques on data sets collected across diverse domains and generated from data generators, also developed as part of this thesis. We show how the proposed techniques can be used to detect anomalies which translate to critical events in domains such as aircraft safety, intrusion detection, and patient health management. The techniques proposed in this thesis are shown to outperform existing techniques on many data sets. The technique proposed for multivariate time series data is one of the very first anomaly detection technique that can detect complex anomalies in such data. To achieve the second goal, we study the relationship between anomaly detection techniques and the nature of the data on which they are applied. A novel analysis framework, Reference Based Analysis (RBA), is proposed that can map a given data set (of any type) into a multivariate continuous space with respect to a reference data set. We apply the RBA framework to not only visualize and understand complex data types, such as multivariate categorical data and symbolic sequence data, but also to extract data driven features from symbolic sequences, which when used with traditional anomaly detection techniques are shown to consistently outperform the state of art anomaly detection techniques for these complex data types. Two novel techniques for symbolic sequences are proposed using the RBA framework which perform better than the best technique for each different data set.Item Dynamic Bayesian Networks: Estimation, Inference and Applications(2016-06) Melnyk, IgorIn recent years, there has been a significant increase in the applications dealing with dynamic, high-dimensional, heterogeneous data streams. For example, in the domains such as healthcare, activity recognition, aviation systems, etc. multiple sensors provide a record of many continuous and discrete parameters over long periods of time, and the objective is to monitor behavior of the objects, discover meaningful patterns or detect anomalous events. In spite of a vast literature on data mining and machine learning techniques, these problems have continued to remain difficult. Primarily this is due to a challenge of proper characterization of the interdependencies between multiple data sources, being a mixture of continuous and discrete type. Moreover, for applications that deal with data monitoring or unusual behavior detection, the additional challenge is a design of discovery algorithms aimed at extracting patterns, trends, anomalies in unsupervised settings where data is commonly noisy and even partially unobservable. In this work, we propose a suite of models and methods for the analysis of such data by using a Dynamic Bayesian Network (DBN) representation. DBN is a general tool for establishing dependencies between variables evolving in time, and is used to represent complex stochastic processes to study their properties or make predictions on the future behavior. The main challenge in using DBN is to identify a model structure, learn its parameters with estimation guarantees and perform efficient inference. Our work has made advances in addressing the above problems, especially in the context of anomaly detection, by proposing several frameworks for anomaly detection in multivariate time series data and building efficient algorithms for learning and inference.Item Modeling and monitoring the long-term behavior of post-tensioned concrete bridges(2014-06) Hedegaard, Brock DanielThe time-dependent and temperature-dependent behavior of post-tensioned concrete bridges were investigated through a case study of the St. Anthony Falls Bridge, consisting of laboratory testing of concrete time-dependent behaviors (i.e., creep and shrinkage), examination of data from the in situ instrumented bridge, and time-dependent finite element models. Laboratory results for creep and shrinkage were measured for 3.5 years after casting, and the data were best predicted by the 1978 CEB/FIP Model Code provisions. To compare the in situ readings to constant-temperature finite element models, the time-dependent behavior was extracted from the measurements using linear regression. The creep and shrinkage rates of the in situ bridge were found to depend on temperature. An adjusted age using the Arrhenius equation was used to account for the interactions between temperature and time-dependent behavior in the measured data. Results from the time-dependent finite element models incorporating the full construction sequence revealed that the 1990 CEB/FIP Model Code and ACI-209 models best predicted the in situ behavior. Finite element analysis also revealed that problems associated with excessive deflections or development of tension over the lifetime of the bridge would be unlikely. The interactions between temperature and time-dependent behavior were further investigated using a simplified finite element model, which indicated that vertical deflections and stresses can be affected by the cyclic application of thermal gradients. The findings from this study were used to develop an anomaly detection routine for the linear potentiometer data, which was successfully used to identify short-term and long-term perturbations in the data.