Browsing by Subject "Machine Learning"

Now showing 1 - 20 of 90

A Multi-Agent Assistant for Medical Diagnosis: Blood-Test Case Study
(2025-04-28) Sargsyan, James
This project explores the use of Multi-Agent Systems and Federated Learning to automate the diagnosis of blood test results, with a focus on complete blood count analyses. A modular system of autonomous agents was designed to preprocess data, extract features, train federated models, and provide decision support. The model achieved high diagnostic accuracy on both small and large public datasets, demonstrating the potential for privacy-preserving, decentralized medical diagnosis. The project highlights opportunities for future development and real-world integration within healthcare systems.
Adaptive Domain Generalization for Digital Pathology Images
(2022-05) Walker, Andrew
In AI-based histopathology, domain shifts are common and well-studied. However, this research focuses on stain and scanner variations, which do not show the full picture– shifts may be combinations of other shifts, or “invisible” shifts that are not obvious but still damage performance of machine learning models. Furthermore, it is important for models to generalize to these shifts without expensive or scarce annotations, especially in the histopathology space and if wanting to deploy models on a larger scale. Thus, there is a need for “reactive” domain generalization techniques: ones that adapt to domain shifts at test-time rather than requiring predictions of or examples of the shifts at training time. We conduct a literature review and introduce techniques that react to domain shifts rather than requiring a prediction of them in advance. We investigatetest time training, a technique for domain generalization that adapts model parameters at test-time through optimization of a secondary self-supervised task.
Advancing architecture optimizations with Bespoke Analysis and Machine Learning
(2023-01) Sethumurugan, Subhash
With transistor scaling nearing atomic dimensions and leakage power dissipation imposing strict energy limitations, it has become increasingly difficult to improve energy efficiency in modern processors without sacrificing performance and functionality. One way to avoid this tradeoff and reduce energy without reducing performance or functionality is to take a cue from application behavior and eliminate energy in areas that will not impact application performance. This approach is especially relevant in embedded systems, which often have ultra-low power and energy requirements and typically run a single application over and over throughout their operational lifetime. In such processors, application behavior can be effectively characterized and leveraged to identify opportunities for ``free'' energy savings. We find that in addition to instruction-level sequencing, constraints imposed by program-level semantics can be used to automate processor customization and further improve energy efficiency. This dissertation describes automated techniques to identify, form, propagate, and enforce application-based constraints in gate-level simulation to reveal opportunities to optimize a processor at the design level. While this can significantly improve energy efficiency, if the goal is truly to maximize energy efficiency, it is important to consider not only design-level optimizations but also architectural optimizations. That being said, architectural optimization presents several challenges. First, the symbolic simulation tool used to characterize gate-level behavior of an application must be written anew for each new architecture. Given the expansiveness of the architectural parameter space, this is not feasible. To overcome this barrier, we developed a generic symbolic simulation tool that can handle any design, technology, or architecture, making it possible to explore application-specific architectural optimizations. However, exploring each parameter variation still requires synthesizing a new design and performing application-specific optimizations, which again becomes infeasible due to the large architecture parameter space. Given the wide usage of Machine Learning (ML) for effective design space exploration, we sought the aid of ML to efficiently explore the architectural parameter space. We built a tool that takes into account the impacts of architectural optimizations on an application and predicts the architectural parameters that result in near-optimal energy efficiency for an application. This dissertation explores the objective, training, and inference of the ML model in detail. Inspired by the ability of ML-based tools to automate architecture optimization, we also apply ML-guided architecture design and optimization for other challenging problems. Specifically, we target cache replacement, which has historically been a difficult area to improve performance. Furthermore, improvements have historically been ad hoc and highly based on designer skill and creativity. We show that ML can be used to automate the design of a policy that meets or exceeds the performance of the current state-of-art.
Advancing Probabilistic Models for Approximate and Exact Inference
(2021-07) Giaquinto, Robert
Probabilistic models have a rich history in machine learning, offering a theoretical and practical framework for learning from observed data. Probabilistic models describe relationships between observed data and latent variables in terms of probability distributions. Practitioners in many fields of science have long been attracted to probabilistic methods as a way to quantify uncertainty in predictions and models, query models via inference, and estimating latent variables. In this thesis, Advancing Probabilistic Models for Approximate and Exact Inference, we connect foundational ideas in machine learning, such as probabilistic inference and ensemble learning, with deep learning. More specifically, the focus lies on the design of generative models with likelihood-based objective functions, which offer a solution for overcoming many broader challenges in machine learning---namely, explaining all of the data, efficient data usage, and quantifying uncertainty. For over two decades graphical models were the predominant paradigm for composing probabilistic models in machine learning. By composing probability distributions as building blocks for larger models, graphical models offer a comprehensible model-building framework that can be tailored to the structure of the data. As a build up to further work, we introduce a novel probabilistic graphical model for analyzing text datasets of multiple authors writing over time. In the era of big data, however, it is necessary to scale such models to large datasets. To that end, we propose an efficient learning algorithm that allows for training and probabilistic inference on text datasets with billions of words with general-purpose computing hardware. Recently, breakthroughs in deep learning have ushered in an explosion of new successes in probabilistic modeling, with models capable of modeling enormous collections of complex data and generating novel yet plausible data (e.g. new images, text, and speech). One promising direction in likelihood-based probabilistic deep learning is normalizing flows. Normalizing flows use invertible transformations to translate between simple and complex distributions, which allows for exact likelihood calculation and efficient sampling. In order to remain invertible and provide exact likelihood calculations, normalizing flows must be composed of differentiable bijective functions. However, bijections require that the inputs and outputs have the same dimensionality---which can pose signiﬁcant architectural, memory, and computational costs for high-dimensional data. We introduce Compressive Normalizing Flows that are, in the simplest case, equivalent to the probabilistic principal components analysis (PPCA). The PPCA-based compressive flow relaxes the bijective constraints and allows the model to learn a compressed latent representation, while offering parameter updates that are available analytically. Drawing on the connection between PPCA and Variational Autoencoders (VAE)---a powerful deep generative model, we extend our framework to VAE-based compressive flows for greater flexibility and scalability. Up until now the trend in normalizing flow literature has been to devise deeper, more complex transformations to achieve greater flexibility. We propose an alternative: Gradient Boosted Normalizing Flows (GBNF) model a complex density by successively adding new normalizing flow components via gradient boosting so that each new component is fit to the residuals of the previously trained components. Because each flow component is itself a density estimator, the aggregate GBNF model is structured like a mixture model. Moreover, GBNFs offer a wider, as opposed to strictly deeper, approach that improves existing NFs at the cost of additional training---not more complex transformations. Lastly, we extend normalizing flows beyond their original unsupervised formulation, and present an approach for learning high-dimensional distributions conditioned on low-dimensional samples. In the context of image modeling, this is equivalent to image super-resolution---the task of mapping a low-resolution (LR) image to a single high-resolution (HR) image. Super-resolution, however, is an ill-posed problem since there are infinitely many HR samples that are compatible with a given LR sample. Approaching super-resolution with likelihood-based models, like normalizing flows, allows us to learn a distribution over all possible HR samples. We present Probabilistic Super-Resolution (PSR) using Normalizing Flows for learning conditional distributions as well as joint PSR where the high- and low-dimensional distributions are modeled simultaneously. However, our approach is not solely for image modeling. Any dataset can be formulated for super-resolution, and using a PSR architecture alleviates challenges commonly associated with normalizing flows such---as the information bottleneck problem, and the inductive biases towards modeling local correlations.
Advancing Remote Sensing For Soybean Aphid (Hemiptera: Aphididae) Management In Soybean
(2019-07) Marston, Zachary
Soybean aphid, Aphis glycines Matsumura (Hemiptera: Aphididae), is the most economically important insect pest of soybean, Glycine max (L.) Merrill (Fabales: Fabaceae), in the north-central United States. Current management recommendations for soybean aphid include frequent scouting of soybean fields and application of foliar insecticides when soybean aphid populations exceed an economic threshold of 250 aphids per plant. The scouting process for soybean aphid is time consuming, expensive, and also fails to thoroughly assess populations across the entire field. Because of these drawbacks, 84% of soybean farmers want to reduce scouting efforts. In 2015, it was determined that soybean aphid-induced stress had a significant effect on red-edge and near-infrared (NIR) reflectance of soybean canopies, offering the potential to use remote sensing for soybean aphid scouting. Utilizing remote sensing for soybean aphid scouting may decrease human effort, increase spatial coverage, and ultimately increase the adoption of recommended management practices. However, it was unknown whether soybean aphid-induced stress could be detected from aerial platforms, whether these reflectance data of aphid-induced stress could be classified into treatment groups, and how confounding factors might affect classification results. My first chapter determined that soybean aphid-induced stress could be detected from an unmanned aerial vehicle (UAV) equipped with a multispectral sensor. Findings indicated that NIR reflectance decreased as aphid populations increased in both caged and open-field experiments. Chapter 2 evaluated ground-based hyperspectral samples and determined that soybean reflectance samples that were above the economic threshold of 250 aphids per plant could be classified with over 86% accuracy using linear support vector machine classification. Chapter 3 further evaluated ground-based hyperspectral samples in the presence of the confounding disease, soybean sudden death syndrome (SDS) caused by the fungal pathogen Fusarium virguliforme O’Donnell and T. Aoki (Hypocraeles: Nectriaceae). Findings indicated that when using linear support vector machines, it was difficult to differentiate between healthy and diseased samples; however, including the diseased group in the classification model decreased false positives for soybean aphid-induced stress. Overall, these findings advance the use of remote sensing for soybean aphid management and provide the first documentation of spectral classification of soybean aphid into threshold-based groups.
AGGRO: Autonomous Gatherer with Guided Retrieval Operations
(2024-07) Anderson, Chase
This thesis presents AGGRO: The Autonomous Gatherer with Guided Retrieval Operations, an innovative robotic system designed to enhance object manipulation in cluttered environments. Building on the foundations of Deep Q-Learning (DQL) and advanced reinforcement learning techniques, AGGRO integrates machine learning, robotics hardware, and sophisticated algorithms to address the "Grasping the Invisible" problem at scale. The system employs a combination of primitive synergies to achieve efficient and precise manipulation of occluded objects. Through comprehensive real-world testing and simulation, the thesis explores various explortation policies, dynamic clutter generation, and the impact of structured clutter scenarios on system performance. The results demonstrate a three policy approach to efficiently reveal targets, fully uncover them, and finally singulate to grasp.
Algorithmically Recognizing Gait Variance from a Sensor-Based System
(2019-05) Madden, Janna
Detection of Vascular Dementia in early stages of Cognitive Impairment is difficult to do in a clinical setting since the earliest changes are often discrete and physiological in nature. One major aspect of this is gait patterns. This project utilizes force-sensing platforms, motion capture, and EMG sensors to unobtrusively collect biometric data from an individual’s walking gait patterns. Following data collection, a series of algorithms computes statistics off the gait cycles. In addition to previously validated biometric indicators of vascular dementia, including stride length, time in stride and swing phases of gait, time in dual leg vs single leg support, this system also examines metrics surrounding balance, lateral movement, and fine-grained gait analysis during critical transition periods of gait, when weight is transferred from one leg to the other. Secondly, by quantifying and analyzing machine learning algorithms, specifically deep learning time-series based models, onset patterns of vascular dementia are explored with an overarching goal of creating a system that will assist in understanding and diagnosing cases of vascular dementia. The proposed system provides a tool for which gait can be analyzed and compared over a long period of time and opens opportunity to increased personalization in health monitoring and disease diagnosis and provides an avenue to increase patient-centricity of medical care.
Algorithms for Semisupervised learning on graphs
(2018-12) Flores, Mauricio
Laplacian regularization has been extensively used in a wide variety of semi-supervised learning tasks over the past fifteen years. In recent years, limitations of the Laplacian regularization have been exposed, leading to the development of a general class of Lp-based Laplacian regularization models. We propose novel algorithms to solve the resulting optimization problem, as the amount of unlabeled data increases to infinity, while the amount of labeled data remains fixed and is very small. We explore a practical application to recommender systems.
Algorithms, Machine Learning, and Speech: The Future of the First Amendment in a Digital World
(2017-06) Wiley, Sarah
We increasingly depend on algorithms to mediate information and thanks to the advance of computation power and big data, they do so more autonomously than ever before. At the same time, courts have been deferential to First Amendment defenses made in light of new technology. Computer code, algorithmic outputs, and arguably, the dissemination of data have all been determined as constituting “speech” entitled to constitutional protection. However, continuing to use the First Amendment as a barrier to regulation may have extreme consequences as our information ecosystem evolves. This paper focuses on developing a new approach to determining what should be considered “speech” if the First Amendment is to continue to protect the marketplace of ideas, individual autonomy, and democracy.
Analog Design Automation in the Era of Machine Learning
(2022-12) Kunal, Kishor
Analog and mixed-signal circuits are everywhere -- in phones, smart watches, self-driving cars, humanoid robots, and IoT devices. However, the problem of automating analog design has perplexed several generations of researchers in electronic design automation (EDA). At its core, the difficulty of the problem is related to the fact that machine-generated designs have been unable to match the quality of the human designer, who recognizes blocks from a netlist and draws upon her/his experience to translate these blocks into a silicon layout. The ability to annotate blocks in a schematic or netlist-level description of a circuit is key to this entire process, but it is a process fraught with complexity. A major reason for this is a large number of variants of each circuit type, which an experienced designer can easily comprehend, but are difficult to encode into an EDA tool. The recent advent of machine learning (ML) provides pathways to breakthrough solutions to automated analog design. Such a capability can enable more widespread use of AMS circuits, which are widely known to have the potential to provide energy-efficient implementations for real-world applications. In fact, for a number of emerging applications, such as the design of ML hardware, AMS implementations can provide superior performance as compared to conventional digital designs. The first part of the thesis showcases applications of graph neural networks (GNNs) for analog layout automation within the ALIGN open-source EDA framework. The automatic identification of hierarchical functional blocks in analog designs can facilitate a variety of design automation tasks. For example, in circuit layout optimization, the optimal layout is dictated by constraints at each level, such as symmetry requirements, that depend on the topology of the hierarchical block. At higher levels of the design hierarchy, where numerous design variants are possible, recent advances in GNNs are leveraged, using a variety of GNN strategies, to identify circuit functional blocks, thus replicating the role of the human expert. At lower levels of hierarchy, where the degrees of freedom in circuit topology are limited, structures are identified using graph-based algorithms. The proposed hierarchical recognition scheme enables the identification of layout constraints such as symmetry and matching, which enable high-quality layout synthesis. This method is demonstrated to be scalable and applicable across a wide range of analog designs. The method shows a high degree of accuracy over a range of designs, identifying functional blocks such as low-noise amplifiers, operational transconductance amplifiers, mixers, oscillators, and band-pass filters within larger circuits. Another challenge in analog layout automation is the need to identify matching and symmetry between elements in the circuit netlist. However, the set of symmetries is circuit-specific and a versatile algorithm, applicable to a broad variety of circuits, has been elusive. The next part of this thesis presents a general methodology for the automated generation of symmetry constraints, and applies these constraints to guide automated layout synthesis. The proposed method operates hierarchically and uses graph-based algorithms to extract multiple axes of symmetry within a circuit. An important ingredient of the algorithm is its ability to identify arrays of repeated structures. In some circuits, these "repeated'' structures are not perfect replicas but show a high degree of similarity, and can only be identified through approximate graph matching. A fast graph neural network-based methodology is developed for this purpose, based on evaluating the graph edit distance between candidate structures. The algorithm is demonstrated on operational amplifiers, data converters, equalizers, and low-noise amplifiers. The final part of the thesis focuses on the application of analog circuits for energy-efficient ML inference. Due to the inherent error tolerance of ML algorithms, many parts of the inference computation can be performed with adequate accuracy and low power under relatively low precision. Early approaches have used digital approximate computing methods to explore this space. An alternative is to use analog circuits, which can deliver lower-power solutions, but are well known to be more susceptible to noise, which degrades precision. Even so, several recent efforts have shown the benefit of using purely analog-based operations to achieve power-efficient computation at moderate precision. This work combines the best of both worlds by proposing a mixed-signal design approach, MiSOML, that optimally blends analog and digital computation for ML inference hardware, incorporating the cost of analog-digital/digital-analog converters where needed. Based on models for speed, accuracy, and power, an integer linear programming formulation is developed to optimize design metrics over the space of analog/digital implementations. On multiple ML architectures, MiSOML demonstrates 5x--8x energy improvement over 8-bit quantized digital implementations.
Analysis and extensions of Universum learning
(2014-01) Dhar, Sauptik
Many applications of machine learning involve sparse high-dimensional data, where the number of input features is larger than (or comparable to) the number of data samples. Predictive modeling of such data sets is very ill-posed and prone to overfitting. Standard inductive learning methods may not be sufficient for sparse high-dimensional data, and this provides motivation for non-standard learning settings. This thesis investigates such a new learning methodology called Learning through Contradictions or Universum Learning proposed by Vapnik (1998, 2006) for binary classification. This method incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. However, such a new methodology is still not well-understood and represents a challenge to end users. An overall goal of this thesis is to improve understanding of this new Universum learning methodology and to improve its usability for general users. Specific objectives of this thesis include:Development of practical conditions for the effectiveness of Universum Learning for binary classification.Extension of Universum Learning to real life classification settings with different misclassification costs and unbalanced data.Extension of Universum Learning to single-class learning problems.Extension of Universum Learning to regression problems.The outcome of this research will result in better understanding and adoption of the Universum Learning methods for classification, single class learning and regression problems, common in many real life applications.
Atrial Fibrillation Readmissions: Temporal Trends, Risk Factors and Data Driven Modeling
(2021-12) Salsabili, Mahsa
This dissertation provides a background with overview of the clinical perspective, policy perspective and application of data driven modeling for atrial fibrillation (AF) and hospital readmissions. Additionally, three aims focused on temporal trends in AF hospitalization and readmission, predictors of AF readmission, and application of machine learning models in AF readmission are presented. The overall purpose of this dissertation is to develop stronger understanding of the temporal trends in AF hospitalizations and readmission and identify factors that increase the likelihood of readmission among the AF population. The value of application of machine learning algorithms to predict readmissions were assessed and compared to traditional methods. Atrial fibrillation is the most common clinically significant cardiac arrhythmia in the United States. Poorly controlled atrial fibrillation patients are likely to be hospitalized and potentially readmitted to the hospital within 30 days. The Nationwide Readmission Database (NRD) was analyzed using the International Classification of Diseases, Ninth Revision (ICD‐9) and tenth revision (ICD-10) codes to identify adult patients with a primary diagnosis of atrial fibrillation at discharge. Among those admitted with atrial fibrillation on average 57,883 individuals were readmitted per year for all-cause readmission within 30 days from 2010 to 2017. The AF index hospitalization rate increased from 10.4 per 1000 adults in 2010 to 11.1 in 2013 and dropped back to 10.4 in 2014 and increased to 10.9 in 2017. This nationally representative study of primary atrial fibrillation admissions and readmissions found that over the 2010 to 2017 time frame, crude atrial fibrillation index admissions increased, except for 2014 wherethere was a decline. Thirty- day all-cause readmission rates remained relatively stable for atrial fibrillation index patients across the study years. There is limited data regarding 30‐day readmission rates and predictors after discharge for atrial fibrillation. The 2017 NRD was assessed using ICD‐10 codes to identify the AF population. Predictors of readmissions, and performance of the predictive model were analyzed. A hierarchical mixed linear model was used on the best performing model to identify the predictors of readmission based on index admission. Presence of comorbidities such as metastatic cancer, lymphoma and severe renal failure present in index atrial fibrillation during index hospitalization predicted higher likelihood 30‐day readmissions. About 1 in 6 patients had an all-cause 30-day readmission. The patient comorbidities contributed significantly to readmission with oncology comorbidities being the top predictor. There is a lack of studies attempting to predict readmissions among AF population using various machine learning techniques. Using the 2017 NRD, we explored the performance of four common and widely used classification approaches (random forest, decision tree, gradient boosting and Naïve Bayes) in 30-day all-cause readmission for AF patients. To have a less biased and more generalizable model 10-fold cross validation was performed to train and test the data, with five variations of feature presentation. We compared and reported common key performance indicators for binary classification techniques (e.g. Area-Under Curve (AUC), accuracy, sensitivity, specificity, and F1 score) among the various classifiers. Our results reveal that Gradient Boosting has the greatest performance with an AUC of 0.667, which was followed by Naïve Bayes and Random Forest with AUCs of 0.641 and 0.640 respectively. The feature variations with comorbidities present have better performance for these three classifiers. Using Gradient Boosting, Random Forest, and Naïve Bayes we get acceptable performance when assessing AF all cause 30-day readmission. Overall, the results of the dissertation show that the prevalence of AF hospitalizations and readmission is increasing over time. Presence of comorbidities among patients increased the likelihood of readmissions. The performance of linear based model and majority of the machine learning based models improved with the presence of variables representing comorbidities. The overall performance of the best performing machine learning models was similar to the linear model in predicting readmissions among the AF population.
Automatic Detection of RWIS Sensor Malfunctions (Phase I)
(University of Minnesota Center for Transportation Studies, 2009-03) Crouch, Carolyn; Crouch, Donald; Maclin, Richard; Polumetla, Aditya
The overall goal of this project was to develop computerized procedures that detect Road Weather Information System (RWIS) sensor malfunctions. In this phase of the research we applied two classes of machine learning techniques to data generated by RWIS sensors in order to predict sensor malfunctions and thereby improve accuracy in forecasting temperature, precipitation, and other weather-related data. We built models using machine learning methods that employ data from nearby sensors in order to predict likely values of those sensors that are being monitored. A sensor that deviates noticeably from values inferred from nearby sensors indicates that the sensor has begun to fail. We used both classification and regression algorithms in Phase I. In particular, we used three classification algorithms (namely, J48 decision trees, naïve Bayes, and Bayesian networks) and six regression algorithms (that is, linear regression, least median squares, M5P, multilayer perceptron, radial basis function network, and the conjunctive rule algorithm). We performed a series of experiments to determine which of these models can be used to detect malfunctions in RWIS sensors. We compared the values predicted by the various machine learning methods to the actual values observed at an RWIS sensor to detect sensor malfunctions. This report provides an overview of the nine models used and a classification of the applicability of each model to the detection of RWIS sensor malfunctions.
Automatic Semantic Segmentation Of Kidney Tumors In Computed Tomography Images
(2023-07) Heller, Nicholas
Semantic segmentation has emerged as a powerful tool for the computational analysis of medical imaging data, but its enormous need for manual effort has limited its adoption in routine clinical practice. Deep learning methods have begun to achieve impressive automatic semantic segmentation performance for a variety of structures in cross-sectional images, but unlike for large well-defined regions like major organs and bones, the performance on small, poorly-circumscribed structures in unpredictable locations, such as lesions, remains relatively poor. This dissertation presents a series of contributions throughout the machine learning pipeline that allow for unprecedented performance on kidney tumor segmentation. Important among these contributions is (1) the demonstration that deep neural networks for cross-sectional image segmentation are highly sensitive to training set label errors around region boundaries, (2) the development of a novel labeling pipeline which avoids such errors while making efficient use of domain expertise, and (3) the extensive benchmarking of a wide variety of deep learning methods applied to a large scale dataset collected using this pipeline. Taken together, these innovations enable the first fully-automatic semantic segmentation of kidney tumors in computed tomography images with performance that is comparable to human experts. The clinical utility of this capability is demonstrated through two studies presenting segmentation-dependent radiomic analyses of kidney tumors, which help us to uncover the relationship between tumor morphology and patient outcomes: First, through the automation of the R.E.N.A.L. score, and second, through the unprecedented segmentation-based analysis of longitudinal kidney tumor scans. Arising from this work are two highly-regarding machine learning competitions (or "challenges") called KiTS19 and KiTS21 which attracted submissions from hundreds of research teams from across the world. These remain some of the most widely-used benchmarks for medical image segmentation today. While experimental results are primarily presented for the kidney tumor segmentation task, the methods developed and findings presented in this dissertation are broadly applicable to any segmentation task where the target structure is small, poorly-circumscribed, found unpredictable locations, and for which the accurate region identification requires domain expertise which is scarce and expensive.
Bridging Visual Perception and Reasoning: A Visual Attention Perspective
(2023-06) Chen, Shi
One of the fundamental goals of Artificial Intelligence (AI) is to develop visual systems that can reason with the complexity of the world. Advances in machine learning have revolutionized many fields in computer vision, achieving human-level performance among several benchmark tasks and industrial applications. While the performance gap between machines and humans seems to be closing, the recent debates on the discrepancies between machine and human intelligence have also received a considerable amount of attention. Studies argue that existing vision models tend to use tactics different from human perception, and are vulnerable to even a tiny shift in visual domains. Evidence also suggests they commonly exploit statistical priors, instead of genuinely reasoning on the visual observations, and have yet to develop the capability to overcome issues resulting from spurious data biases. These contradictory observations strike the very heart of AI research, and bring attention to the question: How can AI systems understand the comprehensive range of visual concepts and reason with them to accomplish various real-life tasks, as we do on a daily basis? Humans learn much from little. With just a few relevant experiences, we are able to adapt to different situations. We also take advantage of inductive biases that can easily generalize, and avoid distraction from all kinds of statistical biases. The innate generalizability is a result of not only our profound understanding of the world but also the ways we perceive and reason with visual information. For instance, unlike machines that develop holistic understanding by scanning through the whole visual scene, humans prioritize their attention with a sequence of eye fixations. Guided by visual stimuli and the structured reasoning process, we progressively locate the regions of interest, and understand their semantic relationships as well as connections to the overall task. Despite the lack of comprehensive understanding of human vision, research on humans' visual behavior can provide abundant insights into the developments of vision models, and have the potential of contributing to AI systems that are practical for real-world scenarios. With an overarching goal of building visual systems with human-like reasoning capability, we focus on understanding and enhancing the integration between visual perception and reasoning. We leverage visual attention as an interface for studying how humans and machines prioritize their focus when reasoning with diverse visual scenes. We tackle the challenges by making progress from three distinct perspectives: From the visual perception perspective, we study the relationship between the accuracy of attention and the performance related to visual understanding; From the reasoning perspective, we pay attention to the connections between reasoning and visual perception, and study the roles of attention throughout the continuous decision-making process; Humans not only capture and reason on important information with high accuracy, but can also justify their rationales with supporting evidence. From the perspective of explainability, we explore the use of multi-modal explanations for justifying the rationales behind models' decisions. Our efforts provide an extensive collection of observations for demystifying the integration between perception and reasoning, and more importantly, they offer insights into the development of trustworthy AI systems with the help of human vision.
Choosing a “Source of Truth”: The Implications of using Self versus Interviewer Ratings of Interviewee Personality as Training Data for Language-Based Personality Assessments
(2022-12) Auer, Elena
Advancement in research and practice in the application of machine learning (ML) and natural language processing (NLP) in psychological measurement has primarily focused on the implementation of new NLP techniques, new data sources (e.g., social media), or cutting-edge ML models. However, research attention, particularly in psychology, has lacked a major focus on the importance of criterion choice when training ML and NLP models. Core to almost all models designed to predict psychological constructs or attributes is the choice of a “source of truth.” Models are typically optimally trained to predict something, meaning the choice of scores the models are attempting to predict (e.g., self-reported personality) is critical to understanding the constructs reflected by the ML or NLP-based measures. The goal of this study was to begin to understand the nuances of selecting a “source of truth” by identifying and exploring the impact of the methodological effects attributable to choosing a “source of truth” when generating language-based personality scores. There were four primary findings that emerged. First, in the context of scoring interview transcripts, there was a clear performance difference between language-based models predicting self-reported scores and interviewer ratings such that language-based models could predict interviewer ratings much better than self-reported ratings of conscientiousness. Second, this is some of the first explicit empirical evidence of the method effects that can occur in the context of language-based scores. Third, there are clear differences between the psychometric properties of language-based self-report and language-based interviewer rating scores and these patterns seemed to be the result of a proxy effect, where the psychometric properties of the language-based ratings mimicked the psychometric properties of the human ratings they were derived from. Fourth, while there was evidence of a proxy effect, language-based scores had slightly different psychometric properties compared to the scores they were trained on, suggesting that it would not be appropriate to fully assume the psychometric properties of language-based assessments based on the ratings the models were trained on. Ultimately, this study is one of the first attempts towards better isolating and understanding the modular effects of language-based assessment methods and future research should continue the application of psychometric theory and research to advances in language-based psychological assessment tools.
Clustering Methods for Correlated Data
(2022-08) Becker, Andrew
Hierarchical Clustering is one of the most popular unsupervised clustering methods.Using a simple agglomerative algorithm, it iteratively combines similar clusters together forming cohesive groups of observations. This work focuses on Hierarchical Clustering and how it may be adapted to accommodate correlated observations. Chapter 2 investigates how to develop a statistical framework for Hierarchical Clustering so we may derive statistical properties from the clustering method. In Chapter 3, a new method, Hierarchical Cohesion Clustering is proposed. This method is a modification of the traditional methods which aims to accommodate correlated observations. This approach explores how repeated measurements may be preprocessed into intermediate clusters to improve clustering outcomes. The method is applied to a sequence-based time use dataset about how people spend their time throughout the day. In Chapter 4, we focus on how to incorporate spatial adjacency data when clustering. We continue to investigate Hierarchical Clustering methods, with a special focus on Hierarchical Cohesion Clustering. Applying the collection of methods to COVID-19 case rate data within counties, a comparison of the methods is performed with summaries of their respective strengths and weaknesses. Spatial simulations are included to better determine each approach’s efficacy and when certain approaches are preferable.
Comparing neural network methods to predict severe hypoglycemic events in Type II diabetic patients
(2024-02) Kirsh, Thomas
Continuous glucose monitoring devices allow people with diabetes to regularly track their glucose levels. Models have been developed using these glucose measurements and used to predict future measurements that are above or below healthy glucose levels. Glucose levels below a certain threshold carry risk for fainting and worse health risks. In this thesis, multiple neural network models are developed, in addition to more traditional machine learning models, and compared across their predictive capabilities, computational resources, and time to train and run. The results show that for some cases, neural networks may be better to use, but in most cases a simpler machine learning model is best, less computationally expensive, and fastest to train.
Computational Analysis of Transcript Interactions and Variants in Cancer
(2015-11) Zhang, Wei
New sequencing and array technologies for transcriptome-wide profiling of RNAs have greatly promoted the interest in gene and isoform-based functional characterizations of a cellular system. Many statistical and machine learning methods have been developed to quantify the isoform/gene expression and identify the transcript variants for cancer outcome prediction. Since building reliable learning models for cancer transcriptome analysis relies on accurate modeling of prior knowledge and interactions between the cellular components, it is still a computational challenge. This thesis proposes several robust and reliable learning models to integrate both large-scale array and sequencing data with biological prior knowledge for cancer transcriptome analysis. First, we explore two signed network propagation algorithms and general optimization frameworks for detecting differential gene expressions and DNA copy number variations (CNV). Second, we present a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets to identify highly consistent signature genes and improve the accuracy of survival prediction. Third, we introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ) to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Finally, we perform computational analysis of mRNA 3'-UTR shortening on mouse embryonic fibroblast (MEF) cell lines to understand changes of molecular features on dysregulated activation of mammalian target of rapamycin (mTOR). We evaluate our models and findings with simulations and real genomic datasets. The results suggest that our models explore the global topological information in the networks, improve the transcript quantification for better sample classification, identified consistent biomarkers to improve cancer prognosis and survival prediction. The analysis of 3'-UTR with RNA-Seq data find an unexpected link between mTOR and ubiquitin-mediated proteolysis pathway through 3'-UTR shortening.
Computational and Statistical Aspects of High-Dimensional Structured Estimation
(2018-05) Chen, Sheng
Modern statistical learning often faces high-dimensional data, for which the number of features that should be considered is very large. In consideration of various constraints encountered in data collection, such as cost and time, however, the available samples for applications in certain domains are of small size compared with the feature sets. In this scenario, statistical estimation becomes much more challenging than in the large-sample regime. Since the information revealed by small samples is inadequate for finding the optimal model parameters, the estimator may end up with incorrect models that appear to fit the observed data but fail to generalize to unseen ones. Owning to the prior knowledge about the underlying parameters, additional structures can be imposed to effectively reduce the parameter space, in which it is easier to identify the true one with limited data. This simple idea has inspired the study of high-dimensional statistics since its inception. Over the last two decades, sparsity has been one of the most popular structures to exploit when we estimate a high-dimensional parameter, which assumes that the number of nonzero elements in parameter vector/matrix is much smaller than its ambient dimension. For simple scenarios such as linear models, L1-norm based convex estimators like Lasso and Dantzig selector, have been widely used to find the true parameter with reasonable amount of computation and provably small error. Recent years have also seen a variety of structures proposed beyond sparsity, e.g., group sparsity and low-rankness of matrix, which are demonstrated to be useful in many applications. On the other hand, the aforementioned estimators can be extended to leverage new types of structures by finding appropriate convex surrogates like the L1 norm for sparsity. Despite their success on individual structures, current developments towards a unified understanding of various structures are still incomplete in both computational and statistical aspects. Moreover, due to the nature of the model or the parameter structure, the associated estimator can be inherently non-convex, which may need additional care when we consider such unification of different structures. In this thesis, we aim to make progress towards a unified framework for the estimation with general structures, by studying the high-dimensional structured linear model and other semi-parametric and non-convex extensions. In particular, we introduce the generalized Dantzig selector (GDS), which extends the original Dantzig selector for sparse linear models. For the computational aspect, we develop an efficient optimization algorithm to compute the GDS. On statistical side, we establish the recovery guarantees of GDS using certain geometric measures. Then we demonstrate that those geometric measures can be bounded by utilizing simple information of the structures. These results on GDS have been extended to the matrix setting as well. Apart from the linear model, we also investigate one of its semi-parametric extension -- the single-index model (SIM). To estimate the true parameter, we incorporate its structure into two types of simple estimators, whose estimation error can be established using similar geometric measures. Besides we also design a new semi-parametric model called sparse linear isotonic model (SLIM), for which we provide an efficient estimation algorithm along with its statistical guarantees. Lastly, we consider the non-convex estimation for structured multi-response linear models. We propose an alternating estimation procedure to estimate the parameters. In spite of dealing with non-convexity, we show that the statistical guarantees for general structures can be also summarized by the geometric measures.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Machine Learning"