Browsing by Subject "Deep learning"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item DeepFGSS: Anomalous Pattern Detection using Deep Learning(2019-05) Kulkarni, AkashAnomaly detection refers to finding observations which do not conform to expected behavior. It is widely applied in many domains such as image processing, fraud detection, intrusion detection, medical health, etc. However, most of the anomaly detection techniques focus on detecting a single anomalous instance. Such techniques fail when there is only a slight difference between the anomalous instance and a non-anomalous instance. Various collective anomaly detection techniques (based on clustering, deep learning, etc) have been developed that determine whether a group of records form an anomaly even though they are only slightly anomalous instances. However, they do not provide any information about the attributes that make the group anomalous. In other words, they are focussed only on detecting records that are collectively anomalous and are not able to detect anomalous patterns in general. FGSS is a scalable anomalous pattern detection technique that searches over both records and attributes. However, FGSS has several limitations preventing it from functioning on continuous, unstructured and high dimensional data such as images, etc. We propose a general framework called DeepFGSS, which uses Autoencoder, enabling it to operate on any kind of data. We evaluate its performance using four experiments on both structured and unstructured data to determine its accuracy of detecting anomalies and efficiency of distinguishing between datasets containing anomalies and ones that do not.Item Empirical Analysis of Optimization and Generalization of Deep Neural Networks(2022-03) Li, XinyanDeep neural networks (DNNs) have gained increasing attention and popularity in the past decade. This is mainly due to their tremendous success in numerous commercial, scientific, and societal tasks. Despite the success of DNNs in practice, several aspects of the optimization dynamics and generalization are still not well understood. In practice, DNNs are usually heavily over-parameterized with far more parameters than training samples, making them easier to memorize all the training examples without learning. In fact, Zhang et al. have shown that DNNs indeed can fit training data perfectly. Training DNNs also requires first-order optimization methods such as gradient descent (GD) and stochastic gradient descent (SGD) to solve a highly non-convex optimization problem. The fact that such heavily over-parameterized DNNs trained by simple GD/SGD are still able to learn and generalize well deeply puzzles the deep learning community. In this thesis, we explore the optimization dynamics and generalization behavior of over-parameterized DNNs trained by SGD from two unique directions. First, we focus on studying the topology of the loss landscape of those DNNs through the analysis of the Hessian of the training loss (with respect to the parameters). We empirically study the second-moment matrix $M_t$ constructed by the outer product of the stochastic gradients (SGs), as well as the Hessian of the loss $H_f(\theta_t)$. With the help of existing tools such as the Lanczos method and the R-operator, we can compute the eigenvalues and the corresponding eigenvectors of both the Hessian matrix and the second-moment matrix efficiently. This allows us to reveal the relationship between the Hessian of the loss and the second moment of SGs. Besides, we discover the ``low-rank'' structure in both the eigenvalue-spectrum of the Hessian and in the stochastic gradients themselves. Such observations directly lead to the development of a new PAC-bayes generalization bound which considers the structure of the Hessian at minima obtained from SGD, as well as a novel noisy truncated stochastic gradient descent (NT-SGD) algorithm, aiming to tackle the communication bottleneck in the large-scale distributed setting. Next, we dive into the debate on whether it is sufficient to explain the success of DNNs in practice by their behavior in the infinite-width limit. On one hand, there has been a rich literature understanding of the infinite width limit of DNNs. Such analysis simplifies the learning dynamics of very wide neural networks by a linear model obtained from the first-order Taylor expansion around its initial parameters. As a result, the DNN training occurs in a ``lazy'' regime. On the other hand, both theoretical and empirical evidence has been presented, pointing out the limitations of lazy training. Those results suggest that training DNNs with gradient descent actually occurs in a ``rich'' regime which captures much richer inductive biases and the behavior of such models cannot be fully described by their infinite-width kernel equivalence. As an empirical complement of the recent work studying the transition from the lazy regime to the rich regime, we study generalization and optimization behaviors of commonly used DNNs, focusing on varying the width and to some extent the depth, and show what happens in typical DNNs used in practice in the rich regime. We also extensively study the smallest eigenvalues of the Neural Tangent Kernel, a crucial element that appeared in many recently theoretical analyses related to both the training and generalization of DNNs. Hopefully, our empirical study could provide fodder for new theoretical advances on understanding generalization and optimization in both rich and lazy regimes.Item Explaining Predictive Artificial Intelligence Models for ECG using Shallow and Generative Models(2020-05) Attia, Zachi ItzahkOpening the lid on the “black box” of artificial intelligence (AI) models including deep neural networks is important for the adoption of this technology in clinical medicine. Given the high stakes, potential for novel or unexpected recommendations, the risk of implicit bias, and the potential legal liability, clinicians may be hesitant to respond to medical diagnoses or therapies suggested by neural networks without the presence of a general understanding of the specific features or characteristics they process to derive their recommendations. Furthermore, the ability to explain predictive AI models may also enhance the ability to improve their performance and to predict appropriate use cases for their adoption. Deep learning methods and convolutional neural networks in specific, achieved state of the art performance in numerous fields and reached human like accuracy in image detection and classification. In some areas, deep learning models superseded human expert capabilities, for example, by detecting asymptomatic left ventricular dysfunction from ECG, by detecting age, sex and cardiovascular risk from fundus photography, and by beating the world champion in Go. Convolutional neural networks use convolutional operations together with non-linear transformations to create feature maps based on the specific outcome the network trained to optimize. While the training of a model as a whole is considered supervised since network weights are optimized with respect to human defined labels, the extraction of the features from a signal is unsupervised, and the features used by a network and their meaning remain unknown (hence, referred to as a “black box”). In traditional computer vision and signal processing, features are engineered based on human knowledge and human observations and later hard coded as a separate step prior to input into a classification model, the human feature are meaningful and in the case of the electrocardiogram (ECG), these features are based on known biological mechanisms. In our work we sought to identify the meaning in convolutional neural network feature maps that were trained on the ECG signal and compare network features to the understandable, human-selected features. Using our proposed methods, which are generalizable, we developed tools to explain AI models. To test, validate, and demonstrate use of this tool, we employ a previously developed AI model that can detect patients age and sex using a surface electrocardiogram (ECG). For any domain with meaningful features, we show that the neural network selects features that are similar to those selected by a human expert, and that neural network “black box” features are in fact a linear combination of human identifiable features. As the network features were created without any human knowledge, this raises the possibility that artificial intelligence models develop a "sense" of the signal it processes in a similar manner to how a human expert does. Thus, artificial intelligence may be truly intelligent; and this work may open the door for creating explainability in artificial intelligence models.Item HyperProtect: A Large-scale Intelligent Backup Storage System(2022-01) Qin, YaobinIn the current big data era, huge financial losses are caused when data becomes unavailable at the original storage side. Protecting data from loss plays a significantly important role in ensuring business continuity. Businesses generally employ third-party backup services to save their data in remote storage and hopefully retrieve the data in a tolerable time range when the original data cannot be accessed. To utilize these backup services, backup users have to handle many kinds configurations to ensure the effective backup of the data. As the scale of backup systems and the volume of backup data continue to grow significantly, the traditional backup systems are having difficulties satisfying the increasing demand requirement of backup users. The fast improvement of machine or deep learning techniques has made them successful in many areas, such as image recognition, object detection, and natural language processing. Compared with other system environments, the backup system environment is more consistent due to the backup nature; the backup data contents are not changed considerably, at least in the short run. Hence, we collected data from real backup systems and analyzed the backup behavior of backup users. By using machine learning techniques, we discovered that some patterns and features can be generalized from the backup environment. We used them to as a guid in the design of an intelligent agent called HyperProtect, which aims to improve the service level provided by the backup systems. To apply machine or deep learning techniques to enhance the service level of the backup systems, we first improved the stability and predictability of the backup environment by proposing a novel dynamic backup scheduling and high-efficiency deduplication. Backup scheduling and deduplication are important backup techniques in backup systems. Backup scheduling determines which backup starts first and which storage is assigned to that backup for improving the backup efficiency. Deduplication is used to remove the redundancy of the backup data to save the storage space. Besides the backup efficiency and storage overhead, we considered maintaining the stability and predictability of the backup environment when processing the backup scheduling and deduplication. When the backup environment became more stable, we applied machine learning to improve the reliability and efficiency of the large-scale backup system. We analyzed data protection system reports written over two years and collected from 3,500 backup systems. We found that inadequate capacity is among the most frequent causes of backup failure. We highlighted the characteristics of backup data and used the examined information to design a backup storage capacity forecasting structure for better reliability of backup systems. According to our observation of an enterprise backup system, for a newly created client, there are no historical backups, so the prefetching algorithm has no reference basis to perform effective fingerprint prefetching. We discovered a backup content correlation between clients from a study of the backup data. We propose a fingerprint prefetching algorithm to improve the deduplication rate and efficiency. Here machine learning and statistical techniques are applied to discover backup patterns and generalize their features. The above efforts introduced machine learning for backup systems. We also considered the other direction, namely, backup systems for machine learning. The advent of the Artificial Intelligence (AI) era has made it increasingly important to have an efficient backup system to protect training data from loss. Furthermore, maintaining a backup of the training data makes it possible to update or retrain the learned model as more data are collected. However, a huge backup overhead will result from always making a complete copy of all collected daily training data for backup storage, especially because data typically contains highly redundant information that does not contribute to model learning. Deduplication is a common technique of reducing data redundancy in modern backup systems. However, existing deduplication methods are invalid for training data. Hence, we propose a novel deduplication strategy for the training data used for learning in a deep neural network classifier.Item Integrating Hyperspectral Imaging and Artificial Intelligence to Develop Automated Frameworks for High-throughput Phenotyping in Wheat(2019-02) Moghimi, AliThe present dissertation was motivated by the need to apply innovative technologies, automation, and artificial intelligence to agriculture in order to promote crop production while protecting our environment. The main objective of this dissertation was to develop sensor-based, automated frameworks for high-throughput phenotyping of wheat to identify advanced wheat varieties based on three desired traits, including yield potential, tolerance to salt stress (an abiotic stress), and resistance to Fusarium head blight disease (a biotic stress). We leveraged the advantages of hyperspectral imaging, a sophisticated sensing technology, and artificial intelligence including machine learning and deep learning algorithms. Through integrating imaging and high-resolution spectroscopy, hyperspectral imaging provides valuable insights into the internal activity of plants, leaf tissue structure, and physiological changes of plants in response to their environment. Alternatively, advanced machine learning and deep learning algorithms are uniquely suited to extract meaningful features and recognize latent patterns associated with the desired phenotyping trait, and ultimately make accurate inferences and prediction. In the first study (Chapter 2), we focused on salt stress phenotyping of wheat in a hydroponic system. A novel method was proposed for hyperspectral image analysis to assess the salt tolerance of four wheat varieties in a quantitative, interpretable, and don-invasive manner. The results of this study demonstrated the feasibility of quantitative ranking of salt tolerance in wheat varieties only one day after applying the salt treatment. In the second study (Chapter 3), we developed an ensemble feature selection pipeline by integrating six supervised feature selection techniques to identify the most informative spectral bands from high-dimensional hyperspectral images captured for plant phenotyping applications. First, the spectral features were ranked based on their ability to discriminate salt-stressed wheat plants from healthy plants at the earliest stages of stress. The proposed method could drastically reduce the dimension of hyperspectral images from 215 to 15 while improving the accuracy of classifying healthy and stressed vegetation pixels by 8.5%. Second, a clustering algorithm was proposed to form six broad spectral bands around the most prominent spectral features to aid in development of a multispectral camera. In the third study (Chapter 4), we aimed to develop a phenotyping framework for Fusarium head blight (FHB), a devastating disease attacking small grain crops. The most informative spectral bands were identified to detect FHB-infected spikes. The results of this study revealed that a set of two broad spectral bands (766 nm and 696 nm) returns a classification accuracy of 99% in detecting FHB-infected spikes. In the fourth study (Chapter 5), we developed an autonomous robotic framework for high-throughput yield phenotyping of wheat in the field. The data were collected by a hyperspectral camera mounted on an unmanned aerial vehicle flying over three experimental fields containing hundreds of wheat plots during two consecutive growing seasons. A deep neural network was trained to predict the yield of wheat plots and estimate the yield variation at a sub-plot scale. The coefficient of determination for predicting the yield at sub-plot and plot scale were 0.79 and 0.41with normalized root-mean-square error of 0.24 and 0.14, respectively. In the fifth study (Chapter 6), we focused on developing a deep autoencoder network by leveraging a large unlabeled dataset (~ 8 million pixels) to learn an optimal feature representation of hyperspectral images in a low dimensional feature space for yield prediction. The result demonstrated that the trained autoencoder could substantially reduce the dimension of hyperspectral images onto a 3-, 5-, and 10-dimenionsal feature space with a mean squared error less than 7e-5, while retaining the relevant information for yield prediction. At a higher level, this dissertation contributes to improving economic, ecological, and social impacts by improving crop production, reducing pesticides use, and properly leveraging salt-affected farmlands. From an environmental perspective, a cultivar with high yield potential and a cultivar resistant to FHB disease both promote sustainability in crop production and environment by reducing the required fertilizer and pesticide to meet the anticipated farmers’ profit. The intelligent, automated phenotyping frameworks developed in this dissertation can help plant scientists and breeders identify crop varieties with the desired traits tailored around promoting crop production and mitigating food security concerns.Item Multi-Exposure Darkfield Digital Inline Holography for Ultrafast Microparticle Tracking(2022-10) Grazzini Placucci, RafaelCurrent imaging solutions are unable to characterize complex and 3D ultra-highspeedmicroscopic flows with sufficient sampling rate, spatial resolution or depth-offield. Options that overcome the challenge in temporal resolution typically sacrifice depth-of-field and spatial resolution for increased framerate, and employ an array of sensors to extend measurements to three dimensions, further aggravating system cost and complexity. Systems that can overcome these challenges without compromising image resolution or affordability are therefore indispensable to advance fluid dynamics, aerosol, and biological research, among others. In fact, they are a essential in challenging flow scenarios such as laminar-turbulent transition in the hypersonic boundary layer, where megahertz frequencies, micron-scale resolution, and a depth-of-field extended to the centimeter range is essential to capture high frequency and 3D small-scale instabilities with sufficient fidelity. In this thesis, a unique holographic imaging technique named multi-exposure darkfield digital inline holography was developed to deliver 3D microparticle tracking at megahertz frequencies over an extended depth-of-field with high resolution using a single-camera setup. The proposed system integrates a low-cost nanosecond pulsed laser and a high-resolution digital camera operating at a prolonged exposure time to capture multiple particle exposures per image frame. A high-pass spatial frequency filter is introduced prior to the camera to prevent the saturation of the sensor and allow the acquisition of full-frame images at megahertz frequencies. The optical system is accompanied by a deep learning framework that incorporates a physics-based synthetic hologram generation algorithm and a conditional generative adversarial network to create a vast dataset of labeled darkfield holograms that are subsequently used to train a regression convolutional neural network for particle depth estimation. Finally, the innovation is demonstrated by imaging 300-350 μm tracers in a 12 × 10 × 30 mm3 measurement volume located above a magnetic rod rotating at 3,000 RPM. Particle trajectories were acquired with a frequency of 750 Hz and spatial resolutions of 2.27 μm and 20 μm in the planar and axial directions, respectively, and used to calculate statistics on particle velocities for a total of 124 trajectories.Item Scalable Learning and Energy Management for Power Grids(2019-01) Zhang, LiangContemporary power grids are being challenged by unprecedented levels of voltage fluctuations, due to large-scale deployment of electric vehicles (EVs), demand-response programs, and renewable generation. Nonetheless, with proper coordination, EVs and responsive demands can be controlled to enhance grid efficiency and reliability by leveraging advances in power electronics, metering, and communication modules. In this context, the present thesis pioneers algorithmic innovations targeting timely opportunities emerging with future power systems in terms of learning, load control, and microgrid management. Our vision is twofold: advancing algorithms and their performance analysis, while contributing foundational developments to guarantee situational awareness, efficiency, and scalability of forthcoming smart power grids. The first thrust to this end deals with real-time power grid monitoring that comprises power system state estimation (PSSE), state forecasting, and topology identification modules. Due to the intrinsic nonconvexity of the PSSE task, optimal PSSE approaches have been either sensitive to initialization or computationally expensive. To bypass these hurdles, this thesis advocates deep neural networks (DNNs) for real-time PSSE. By unrolling an iterative physics-based prox-linear PSSE solver, a novel model-specific DNN with affordable training and minimal tuning effort is developed. To further enable system awareness even ahead of the time horizon, as well as to endow the DNN-based estimator with resilience, deep recurrent neural networks (RNNs) are also pursued for state forecasting. Deep RNNs leverage the long-term nonlinear dependencies present in the historical voltage time series to enable forecasting, and they are easy to implement. Finally, multi-kernel learning based partial correlations accounting for nonlinear dependencies between given nodal measurements are leveraged to unveil connectivity of power grids. The second thrust leverages the obtained state and topology information to design optimal load control and microgrid management schemes. With regards to EV load control, a decentralized protocol relying on the Frank-Wolfe algorithm is put forth to manage the heterogeneous charging loads. The novel paradigm has minimal computational requirements, and is resilient to lost updates. When higher levels of EV load exceed prescribed voltage limits, the underlying grid needs to be taken into account. In this context, communication-free local reactive power control and optimal decentralized energy management schemes, are developed based on the proximal gradient method and the alternating direction method of multipliers, respectively.Item Towards Hardware-Software Co-design for Energy-Efficient Deep Learning(2023-06) Unnikrishnan, NandaArtificial intelligence (AI) has become an increasingly important and prevalent technology in today’s world. The past decade has seen tremendous growth in AI with it being used in a wide range of applications, including healthcare, finance, transportation, research, manufacturing, and even entertainment. One of the most significant advancements in AI has been the development of deep neural networks (DNNs), which have revolutionized the field by providing unprecedented human-like performance in solving many real-world problems. However, the computations involved in DNNs are expensive and time-consuming, especially for large and complex networks. Additionally, a variety of models, like convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and graph neural networks (GNNs), pose significant challenges for hardware design, particularly due to the diverse set of operations used. Each operation brings its own set of challenges for energy, performance, and memory that do not always align with one another precluding a one size fits all solution. The thesis addresses the above challenges in three parts. The first part tries to develop a fundamental understanding of the different operations involved in different DNN models. This thesis explores the evolution of brain-inspired computing models from a historical context, focusing on DNNs, CNNs, RNNs, and GNNs among others. This provides the necessary context for optimizing DNN operations for training and inference. The second part of the thesis proposed hardware-software co-design techniques inspired by the design of DSP systems to address energy, computation, and memory challenges during training for CNNs. The thesis proposes a novel approach for using systolic architectures to train convolutional neural networks using gradient interleaving, called InterGrad. The approach involves interleaving the computations of two gradients on the same configurable systolic array, resulting in significant savings in terms of the number of cycles and memory accesses. The proposed method uses 25% fewer cycles and memory accesses, and 16% less energy in state-of-the-art CNNs, and up to 2.2× fewer cycles and memory accesses in the fully connected layers. The thesis also presents a novel optimization approach called LayerPipe, which explores how to partition optimally and pipeline DNN training workload on multi-processor systems. LayerPipe can better balance workloads while minimizing the communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors when compared to prior approaches such as PipeDream. Lastly, the thesis explores the design of dedicated hardware accelerators for graph neural networks (GNNs). The proposed SCV-GNN method uses a novel sparse compressed vectors (SCV) format optimized for the aggregation operation. The proposed method achieves a geometric mean speedup of 7.96× and 7.04× over a compressed sparse column (CSC) and compressed sparse rows (CSR) aggregation operations, respectively, and reduces the memory traffic by a factor of 3.29× and 4.37× over CSC and CSR, respectively.