Browsing by Subject "Computer Vision"

Now showing 1 - 20 of 21

Artificial Intelligence to Accelerate COVID-19 Identification from Chest X-rays
(2021-05) Adila, Dyah
Importance: Clinical signs and symptoms for COVID-19 remain the mainstay of early diagnosis and initial management in the emergency department (ED) and inpatient setting at many hospitals due to de- lays in obtaining results of PCR testing and limitations in access to rapid antigen testing. The majority of many patients with COVID- 19 will present with respiratory symptoms necessitating a chest x-ray (CXR) as a routine part of screening. An AI-based model to predict COVID-19 likelihood from CXR findings can serve as an important and immediate adjunct to accelerate clinical decision making.Objective: To develop a robust AI-based diagnostic model to identify CXRs with COVID-19 compared with all non-COVID-19 CXRs. Setting: Labeled frontal CXR images (samples of COVID-19 and non-COVID-19) from the M Health Fairview (Minnesota, USA), Va- Valencian Region Medical ImageBank (Spain), MIMIC-CXR, Open-i 2013 Chest X-ray Collection, GitHub COVID-19 Image Data Collection (International). Main Outcome and Measure: Model performance assessed via Area under the Receiver Operating Curve (AUROC) and Area Under the Precision and Recall Curve (AUPRC). Results: Patients with COVID-19 had significantly higher COVID- 19 Diagnostic Scores than patients without COVID-19 on both real-time electronic health records and external (non-publicly available) validation. The model performed well across all four methods for model validation with AUROCs ranging between 0.7 – 0.96 and high PPV and specificity. The model performed had improved discrimination for patients with “severe” as compared to “moderate” COVID-19 disease. The model had unrealistic performance using publicly available databases, reflecting the inherent limitations in many previously developed models relying on publicly available data for training and validation. Conclusions and Relevance: AI-based diagnostic tools may serve as an adjunct, but not replacement, to support COVID-19 diagnosis which largely hinges on exposure history, signs, and symptoms. Future research should focus on optimizing discrimination of “mild” COVID-19 from non-COVID-19 image findings.
Automated Wheat Stem Rust Detection using Computer Vision
(2023-05) Mahesh, Rahul Moorthy
Wheat is one of the most important cereal crops, contributing significantly to the financial economy and food sources. Currently, the direct consumption of wheat amounts to about 41%. Additionally, in 2019 alone, the global trade value of wheat was about $39.6 billion. Hence, the protection of the yield of such crops from diseases is of immense importance. Stem rust is a fungal disease that attacks cereal crops. In particular, it is a common disease that occurs in wheat and destroys 50 to 70% of the yield if left unchecked. The loss of yield would in turn affect the economy and food consumption. Thus, there is a need to detect the outbreak early to apply fungicide treatment to the field. The traditional approach for detection involves experts inspecting the fields visually and grading them for stem rust which is a time-consuming process for a large field and can also be affected by human errors. Hence, an automated approach to the grading process would help solve such problems. The availability of an automated grading process will allow mobile robots, popularly being used for activities like irrigation, seed sowing, and precision agriculture to rapidly perform grading and alert the experts in case of detected stem rust. The alert through the automated detection would in turn lead to a timely application of fungicide for preventing the spread of stem rust in an efficient manner. The thesis focuses on formulating the wheat rust grading as a multi-class classification problem and demonstrating the effectiveness of the visual attention approach for solving it. The thesis also presents the first RGB field dataset with labels from experts for the development of automated stem rust grading approaches. The proposed approach was developed and evaluated on the presented dataset and shows the ability to distinguish between different intensities of stem rust with 86% accuracy. The reliability of the network is also validated qualitatively through attention maps where the visual attention approach shows interpretable focus areas compared to traditional detection approaches which fail to identify the general presence area of stem rust.
Automatic Semantic Segmentation Of Kidney Tumors In Computed Tomography Images
(2023-07) Heller, Nicholas
Semantic segmentation has emerged as a powerful tool for the computational analysis of medical imaging data, but its enormous need for manual effort has limited its adoption in routine clinical practice. Deep learning methods have begun to achieve impressive automatic semantic segmentation performance for a variety of structures in cross-sectional images, but unlike for large well-defined regions like major organs and bones, the performance on small, poorly-circumscribed structures in unpredictable locations, such as lesions, remains relatively poor. This dissertation presents a series of contributions throughout the machine learning pipeline that allow for unprecedented performance on kidney tumor segmentation. Important among these contributions is (1) the demonstration that deep neural networks for cross-sectional image segmentation are highly sensitive to training set label errors around region boundaries, (2) the development of a novel labeling pipeline which avoids such errors while making efficient use of domain expertise, and (3) the extensive benchmarking of a wide variety of deep learning methods applied to a large scale dataset collected using this pipeline. Taken together, these innovations enable the first fully-automatic semantic segmentation of kidney tumors in computed tomography images with performance that is comparable to human experts. The clinical utility of this capability is demonstrated through two studies presenting segmentation-dependent radiomic analyses of kidney tumors, which help us to uncover the relationship between tumor morphology and patient outcomes: First, through the automation of the R.E.N.A.L. score, and second, through the unprecedented segmentation-based analysis of longitudinal kidney tumor scans. Arising from this work are two highly-regarding machine learning competitions (or "challenges") called KiTS19 and KiTS21 which attracted submissions from hundreds of research teams from across the world. These remain some of the most widely-used benchmarks for medical image segmentation today. While experimental results are primarily presented for the kidney tumor segmentation task, the methods developed and findings presented in this dissertation are broadly applicable to any segmentation task where the target structure is small, poorly-circumscribed, found unpredictable locations, and for which the accurate region identification requires domain expertise which is scarce and expensive.
Behavior Monitoring Using Visual Data and Immersive Environments
(2017-08) Fasching, Joshua
Mental health disorders are the leading cause of disability in the United States and Canada, accounting for 25 percent of all years of life lost to disability and premature mortality (Disability Adjusted Life Years or DALYs). Furthermore, in the United States alone, spending for mental disorder related care amounted to approximately $201 billion in 2013. Given these costs, significant effort has been spent on researching ways to mitigate the detrimental effects of mental illness. Commonly, observational studies are employed in research on mental disorders. However, observers must watch activities, either live or recorded, and then code the behavior. This process is often long and requires significant eﬀort. Automating these kinds of labor intensive processes can allow these studies to be performed more eﬀectively. This thesis presents efforts to use computer vision and modern interactive technologies to aid in the study of mental disorders. Motor stereotypies are a class of behavior known to co-occur in some patients diagnosed with autism spectrum disorders. Results are presented for activity classification in these behaviors. Behaviors in the context of environment, setup and task were also explored in relation to obsessive compulsive disorder (OCD). Cleaning compulsions are a known symptom of some persons with OCD. Techniques were created to automate coding of handwashing behavior as part of an OCD study to understand the difference between subjects of different diagnosis. Instrumenting the experiment and coding the videos was a limiting factor in this study. Varied and repeatable environments can be enabled through the use of virtual reality. An end-to-end platform was created to investigate this approach. This system allows the creation of immersive environments that are capable of eliciting symptoms. By controlling the stimulus presented and observing the reaction in a simulated system, new ways of assessment are developed. Evaluation was performed to measure the ability to monitor subject behavior and a protocol was established for the system's future use.
Bridging Mri Reconstruction Across Eras: From Novel Optimization Of Traditional Methods To Efficient Deep Learning Strategies
(2024-03) Gu, Hongyi
Magnetic Resonance Imaging (MRI) has been extensively used as a non-invasive modality for imaging the human body. Despite substantial advances over the past decades, scan duration remains as a principal issue for MRI scanning, requiring novel techniques to accelerate data acquisition. Such techniques are poised to improve clinical patient throughput, reduce motion artifacts, enhance subject comfort, and allow higher resolution imaging in many applications. Several methods have been proposed to accelerate MRI scans. In parallel imaging (PI), k-space data was acquired at a sub-Nyquist rate with with multiple receiver coils, and the redundancy among these coils were used for image reconstruction. Following the clinical impact and success of PI methods, compressed sensing (CS) techniques were developed to reconstruct images by using compressibility of images in a pre-specified linear transform domain. Transform learning (TL) was another line of work that learned the linear transforms from data, while enforcing sparsity as in CS. Recently, deep learning (DL) has shown great promise for MRI reconstruction, especially at high acceleration rates where other traditional methods would fail. Specially, physics-guided DL (PG-DL) unrolls a traditional optimization algorithm for solving regularized least squares for a fixed number of iterations, and uses neural networks to implicitly perform regularization. These unrolled networks are trained end-to-end with large databases, using well-designed loss functions and advanced optimizers, usually using a reference fully-sampled image for supervised learning. Several approaches have noted the difficulty or impossibility of acquiring fully-sampled data in various MRI applications. Among these, self-supervised learning with data undersmapling (SSDU) was developed to allow training without fully-sampled data, and multi-mask SSDU was subsequently proposed for better reconstruction quality at high acceleration rates. Although PG-DL generally shows strong ability for excellent reconstruction performance, there are concerns for generalizabilty, interpretability and stability issues. In this thesis, we aimed to bridge the gap between traditional and DL methods, while also extending the utility of DL methods for non-Cartesian imaging. We first revisited l1-wavelet CS reconstruction for accelerated MRI by using modern data science tools similar to those used in DL for optimized performance. We showed that our proposed optimization approach improved traditional CS, and further performance boost was observed by incorporating wavelet subband processing and reweighted l1 minimization. The final version reached a performance similar to state-of-the-art PG-DL, while preserving better interpretability by solving a convex optimization problem in inference time. Second, we combined ideas from CS, TL and DL to enable the learning of deep linear convolutional transforms in a format similar to PG-DL. Our proposed method performed better than CS and TL, and gave similar performance as state-of-the-art PG-DL. It used a linear representation of image as regularization at inference time, and enabled convex sparse image reconstruction that may have better robustness, stability and generalizability properties. Third, we adapted a self-supervised PG-DL technique to non-Cartesian trajectories and showed its potential for reconstructing 10-fold accelerated spiral fMRI multi-echo acquisitions. Our proposed approach gave substantial improvements in reconstructed image quality over conventional methods. Furthermore, the blood oxygenation level dependent (BOLD) signal analysis of our proposed method provided meaningful sensitivities, with similar activation patterns and extent to the expected baselines.
Bridging Visual Perception and Reasoning: A Visual Attention Perspective
(2023-06) Chen, Shi
One of the fundamental goals of Artificial Intelligence (AI) is to develop visual systems that can reason with the complexity of the world. Advances in machine learning have revolutionized many fields in computer vision, achieving human-level performance among several benchmark tasks and industrial applications. While the performance gap between machines and humans seems to be closing, the recent debates on the discrepancies between machine and human intelligence have also received a considerable amount of attention. Studies argue that existing vision models tend to use tactics different from human perception, and are vulnerable to even a tiny shift in visual domains. Evidence also suggests they commonly exploit statistical priors, instead of genuinely reasoning on the visual observations, and have yet to develop the capability to overcome issues resulting from spurious data biases. These contradictory observations strike the very heart of AI research, and bring attention to the question: How can AI systems understand the comprehensive range of visual concepts and reason with them to accomplish various real-life tasks, as we do on a daily basis? Humans learn much from little. With just a few relevant experiences, we are able to adapt to different situations. We also take advantage of inductive biases that can easily generalize, and avoid distraction from all kinds of statistical biases. The innate generalizability is a result of not only our profound understanding of the world but also the ways we perceive and reason with visual information. For instance, unlike machines that develop holistic understanding by scanning through the whole visual scene, humans prioritize their attention with a sequence of eye fixations. Guided by visual stimuli and the structured reasoning process, we progressively locate the regions of interest, and understand their semantic relationships as well as connections to the overall task. Despite the lack of comprehensive understanding of human vision, research on humans' visual behavior can provide abundant insights into the developments of vision models, and have the potential of contributing to AI systems that are practical for real-world scenarios. With an overarching goal of building visual systems with human-like reasoning capability, we focus on understanding and enhancing the integration between visual perception and reasoning. We leverage visual attention as an interface for studying how humans and machines prioritize their focus when reasoning with diverse visual scenes. We tackle the challenges by making progress from three distinct perspectives: From the visual perception perspective, we study the relationship between the accuracy of attention and the performance related to visual understanding; From the reasoning perspective, we pay attention to the connections between reasoning and visual perception, and study the roles of attention throughout the continuous decision-making process; Humans not only capture and reason on important information with high accuracy, but can also justify their rationales with supporting evidence. From the perspective of explainability, we explore the use of multi-modal explanations for justifying the rationales behind models' decisions. Our efforts provide an extensive collection of observations for demystifying the integration between perception and reasoning, and more importantly, they offer insights into the development of trustworthy AI systems with the help of human vision.
Consistency Analysis and Improvement for Vision-aided Inertial Navigation
(2016-03) Hesch, Joel
Navigation systems capable of estimating the six-degrees-of-freedom (d.o.f.) position and orientation (pose) of an object while in motion have been actively developed within the research community for several decades. Numerous potential applications include human-navigation aids for the visually impaired, first responders, and firefighters, as well as localization systems for autonomous vehicles such as submarines, ground robots, unmanned aerial vehicles, and spacecraft. The mobile industry has also recently become interested in six-dof localization for enabling interesting new applications on smart phones and tablets, such as games that are aware of motions in 3D space. The Global Positioning System (GPS) satellite network has been relied on extensively in pose-estimation applications; however, both humans and vehicles often need to operate in a wide variety of environments that preclude the use of GPS (e.g., underwater, inside buildings, in the urban canyon, and on other planets). In order to estimate the 3D motion of person or robot in GPS-denied areas, it is requisite to employ sensors to determine the platform's displacement over time. To this end, inertial measurement units (IMUs) that sense the three-d.o.f rotational velocity as well as three-d.o.f. linear acceleration have been extensively used. IMU measurements, however, are corrupted by both sensor noise and bias, causing the resulting pose estimates to quickly become unreliable for navigation purposes. Although high-accuracy IMUs exist, they remain prohibitively expensive for widespread use. For this reason, it is common to aid an inertial navigation system (INS) with an alternative sensor such as a laser scanner, sonar, radar, or camera whose measurements can be employed to determine the platform's pose (or motion) with respect to the surrounding environment. Of these possible aiding sources, cameras have received significant attention due to their small size and weight, and the rich information that they supply. State-of-the-art vision-aided inertial navigation systems (VINS) are able to provide highly-accurate pose estimates over short periods of time, however, they continue to exhibit limitations that prevent them from being used in critical applications for long-term deployment. Most notably, current approaches produce inconsistent state estimates, i.e., the errors are biased and the corresponding uncertainty in the estimate is unduely small. In this thesis, we examine two key sources of estimator inconsistency for VINS, and propose solutions to mitigate these issues.
Extrinsic and intrinsic sensor calibration
(2013-12) Mirzaei, Faraz M.
Sensor Calibration is the process of determining the intrinsic (e.g., focal length) and extrinsic (i.e., position and orientation (pose) with respect to the world, or to another sensor) parameters of a sensor. This task is an essential prerequisite for many applications in robotics, computer vision, and augmented reality. For example, in the field of robotics, in order to fuse measurements from different sensors (e.g., camera, LIDAR, gyroscope, accelerometer, odometer, etc. for the purpose of Simultaneous Localization and Mapping or SLAM), all the sensors' measurements must be expressed with respect to a common frame of reference, which requires knowing the relative pose of the sensors. In augmented reality the pose of a sensor (camera in this case) with respect to the surrounding world along with its internal parameters (focal length, principal point, and distortion coefficients) have to be known in order to superimpose an object into the scene. When designing calibration procedures and before selecting a particular estimation algorithm, there exist two main issues of concern than one needs to consider: Whether the system is observable, meaning that the sensor's measurements contain sufficient information for estimating all degrees of freedom (d.o.f.) of the unknown calibration parameters; Given an observable system, whether it is possible to find the globally optimal solution.Addressing these issues is particularly challenging due to the nonlinearity of the sensors' measurement models. Specifically, classical methods for analyzing the observability of linear systems (e.g., the observability Gramian) are not directly applicable to nonlinear systems. Therefore, more advanced tools, such as Lie derivatives, must be employed to investigate these systems' observability. Furthermore, providing a guarantee of optimality for estimators applied to nonlinear systems is very difficult, if not impossible. This is due to the fact that commonly used (iterative) linearized estimators require initialization and may only converge to a local optimum. Even with accurate initialization, no guarantee can be made regarding the optimality of the solution computed by linearized estimators. In this dissertation, we address some of these challenges for several common sensors, including cameras, 3D LIDARs, gyroscopes, Inertial Measurement Units (IMUs), and odometers. Specifically, in the first part of this dissertation we employ Lie-algebra techniques to study the observability of gyroscope-odometer and IMU-camera calibration systems. In addition, we prove the observability of the 3D LIDAR-camera calibration system by demonstrating that only a finite number of values for the calibration parameters produce a given set of measurements. Moreover, we provide the conditions on the control inputs and measurements under which these systems become observable. In the second part of this dissertation, we present a novel method for mitigating the initialization requirements of iterative estimators for the 3D LIDAR-camera and monocular camera calibration systems. Specifically, for each problem we formulate a nonlinear Least-Squares (LS) cost function whose optimality conditions comprise a system of polynomial equations. We subsequently exploit recent advances in algebraic geometry to analytically solve these multivariate polynomial systems and compute the LS critical points. Finally, the guaranteed LS-optimal solutions are directly found by evaluating the cost function at the critical points without requiring any initialization or iteration.Together, our observability analysis and analytical LS methods provide a framework for accurate and reliable calibration of common sensors in robotics and computer vision.
FaceKeeper - Privacy-Aware Distributed Computation of Family Photo Collections
(2018-04) Vachher, Prateek
Family photos are personal, private information. There is a need for alternative storage models for secure, private (offline) media archiving & curation. Server-Client relations are now changing from centralized form of network to distributed and decentralized network. The centralized network model provides all of the privacy control of users to the host of the network. The research objective of the project was to design a novel system for family photo collection analysis by leveraging local home networking, reducing dependence on cloud services prioritizing privacy and maximizing sustainability through device reuse (IOT). The outcomes and deliverables of the research included: A robust face clustering algorithm written in Python, with documentation available for open source use on GitHub. Reverse Image Search on Photo Collection feature was built complementing the research. The delivered software supports device re-use by supporting single-board computers such as Raspberry Pi.
Image classification with minimal supervision
(2011-06) Joshi, Ajay Jayant
With growing collections of images and video, it is imperative to have automated techniques for extracting information from visual data. A primary task that lies at the heart of information extraction is image classification, which refers to classifying images or parts of them as belonging to certain categories. Accurate and reliable image classification has diverse applications { web image and video search, content based image retrieval, medical image analysis, autonomous robotics, gesture-based human computer interfaces, etc. However, considering the large image variability and typically high-dimensional representations, training predictive models requires substantial amounts of annotated data, often provided through human supervision { supplying such data is expensive and tedious. This training bottleneck is the motivation for development of robust algorithms that can build powerful predictive models with little training or supervision. In this thesis, we propose new algorithms for learning with data, particularly focusing on active learning. Instead of passively accepting training data, the basic idea in active learning is to select the most informative data samples for the human to annotate. This can lead to extremely efficient allocation of resources, and results in predictive models that require far fewer training samples compared to the passive setting. We first propose an active sample selection criterion for training large multi-class classifiers with hundreds of categories. The criterion is easy to compute, and extends traditional two-class active learning to the multi-class setting. We then generalize the approach to handle only binary (yes / no) type feedback while still performing classification in the multi-class domain. The proposed modality provides substantial interactive simplicity, and makes it easy to distribute the training process across many users. Active learning has been studied from two different perspectives: selective sampling from a pool, and query synthesis; both perspectives o#11;er different tradeoffs. We propose a formulation that combines both approaches while leveraging their individual strengths resulting in a scalable and efficient multi-class active learning scheme.Experimental results show efficient training of classification systems with a pool of a few million images on a single computer. Active learning is intimately related to a large body of previous work on experiment design and optimal sensing { we discuss the similarities and key differences between the two. A new greedy batch-mode sample selection algorithm is proposed that shows substantial benefits over random batch selection, when iterative querying cannot be applied. We finally discuss two applications of active selection: i) active learning of compact hash codes for fast image search and classification, and ii) incremental learning of a classifier in a resource-constrained environment to handle changing scene conditions. Throughout the thesis, we focus on thorough experimental validation on a variety of image datasets to analyze strengths and weaknesses of the proposed methods.
Investigation into Feasibility of Color and Texture Features for Automated Detection of Lymph Node Metastases in Histopathological Images
(2017-05) Bergeron, Sydney
The purpose of this thesis is to investigate the feasibility of commonly used features in histopathological image analysis for the purpose of locating various carcinoma metastases within lymph nodes, regardless of the tissue from which they originated. Previous work in cancer detection within lymph nodes has been limited to tissue specific classification such as lymphomas or breast cancer metastases. A general carcinoma classifier, one that can discriminate between healthy lymph tissues and many different carcinomas, would enhance pathologist's diagnosing confidence and speed. To investigate a general carcinoma classifier, 24 Hematoxylin \& Eosin stained lymph node images containing 9 different carcinoma types were used to gather 989,531 training examples to train support vector machines. A hue histogram, a texture feature using the local range of pixels, and a combination of hue and texture were tested using 5-fold cross validation on a 250,000 sample subset of the data and subsequently tested on the remaining data. The hue-texture combined feature performed the best, with a cross validation accuracy of 96.26\% and a classification accuracy of 96.90\%, within 1.5\% of the best performing breast cancer metastases detector. For further investigation into feasibility as well as the type of tissues that generate false positives, a probability heatmap for 5 new Hematoxylin \& Eosin stained lymph node images was generated using the hue-texture classifier. This heatmap was used to highlight suspicious areas within a slide. The heatmap received a pathologist rating of 4.8/5 for success in locating metastases, and a 4.2/5 for helpfulness in pathologist's time saving and confidence boosting. In this thesis, the use of color and texture features together proved feasible for discriminating between healthy lymph tissue and carcinoma tissues.
A learning approach to detecting lung nodules in CT Images.
(2009-12) Aschenbeck, Michael G.
Lung cancer is one of the most common types of cancer and has the highest mortality rate. Unfortunately, it is a long and difficult process for the physician to detect the presence of this disease. He/she must search through three-dimensional medical images and look for possibly cancerous, small structures that are roughly spherical. These structures are called pulmonary nodules. Due to the difficult and time consuming detection task faced by the physician, computer-aided detection (CAD) has been the focus of many research efforts. Most of these works involve segmenting the image into structures, extracting features from the structures, and classifying the resulting feature vectors. Unfortunately, the first of these tasks, segmentation, is a difficult problem and many times the origin for missed detections. This work attempts to eliminate the structure segmentation step. Instead, features are extracted from fixed size subwindows and sent to a classifier. Bypassing the segmentation step allows for every location to be classified. Feature extraction is accomplished by learning a complete basis for the subwindow on the training set and using the inner product of the subwindow with each basis element. This approach is preferred over choosing features based on human interpretation, as the latter approach will most likely result in valuable information being overlooked. The bases used are derived from the singular value decomposition (SVD), a modification of the SVD, tensor decompositions, vectors reminiscent of the Haar wavelets, and the Fourier basis. The features are sent to a number of different classifiers for comparison. The classifiers include parametric methods such as likelihood classifiers and probabilistic clustering, as well as non-parametric classifiers such as kernel support vector machines (SVM), classification trees, and AdaBoost. While different feature and classifiers bring about a wide range of results, the non-parametric classifiers unsurprisingly produce much better detection and false positive rates. The best combination on the test set yields 100\% detection of the nodule subwindows, while only classifying 1\% of the non-nodule windows as nodules. This is in comparison to previous CAD approaches discussed in this thesis which achieve no better than 85\% detection rates.
Learning from Pixels: Image-Centric State Representation Reinforcement Learning for Goal Conditioned Surgical Task Automation
(2023-11) Gowdru Lingaraju, Srujan
Over the past few years, significant exploration has occurred in the field of automating surgical tasks through off-policy Reinforcement Learning (RL) methods. These methods have witnessed notable advancements in enhancing sample efficiency (such as with the use of Hindsight Experience Replay - HER) and addressing the challenge of exploration (as seen in Imitation Learning approaches). While these advancements have boosted RL model performance, they all share a common reliance on accurate ground truth state observations. This reliance poses a substantial hurdle, particularly in real-world scenarios where capturing an accurate state representation becomes notably challenging.This study addresses the aforementioned challenge by exploiting an Asymmetric Actor-Critic framework while addressing the issues of sample efficiency and exploration burden by using HER and behavior cloning. Within this framework, the Critic component is trained on the complete state information, whereas the Actor component is trained on partial state observations, thus diminishing the necessity for pre-trained state representation models. The proposed methodology is evaluated within the context of SurRoL, a surgical task simulation platform. The experimental results showcased that the RL model, operating with this configuration, achieves task performance akin to models trained with complete ground truth state representations. Additionally, we delve into the necessity for Sim-to-Real transfer methods and elucidate some of the formidable challenges inherent in this process and present a comprehensive pipeline that addresses the intricacies of domain adaptation. This research thus presents a promising avenue to mitigate the reliance on pre-trained models for state representation in the pursuit of effective surgical task automation.
Machine Learning Methods with Emphasis on Cancerous Tissue Recognition
(2018-08) Stanitsas, Panagiotis
Today, vast and unwieldy data collections are regularly being generated and analyzed in hopes of supporting an ever-expanding range of challenging sensing applications. Modern inference schemes usually involve millions of parameters to learn complex real-world tasks, which creates the need for large annotated datasets for training. For several visual learning applications, collecting large amounts of annotated data is either challenging or very expensive; one such domain is medical image analysis. In this thesis, machine learning methods were devised with emphasis on Cancerous Tissue Recognition (CTR) applications. First, a lightweight active constrained clustering scheme was developed for the processing of image data which capitalizes on actively acquired pairwise constraints. The proposed methodology introduces the use of the Silhouette values, conventionally used for measuring clustering performance, in order to rank the degree of information content of the various samples. Second, an active selection framework that operates in tandem with Convolutional Neural Networks (CNNs) was constructed for CTR. In the presence of limited annotations, alternative (or sometimes complementary) venues were explored in an effort to restrain the high expenditure of collecting image annotations required by CNN-based schemes. Third, a Symmetric Positive Definite (SPD) image representation was derived for CTR, termed Covariance Kernel Descriptor (CKD) which consistently outperformed a large collection of popular image descriptors. Even though the CKD successfully describes the tissue architecture for small image regions, its performance decays when implemented on larger slide regions or whole tissue slides due to the larger variability that tissue exhibits at that level, since different types of tissue can be present as the regions grow (healthy, benign disease, malignant disease). Fourth, to leverage the recognition capability of the CKDs to larger slide regions, the Weakly Annotated Image Descriptor (WAID) was devised as the parameters of classifier decision boundaries in a multiple instance learning framework. Fifth, an Information Divergence and Dictionary Learning (IDDL) scheme for SPD matrices was developed for identifying appropriate geometries and similarities for SPD matrices and was successfully tested on a diverse set of recognition problems including activity, object, and texture recognition as well as CTR. Finally, a transition of IDDL to an unsupervised setup was developed, dubbed alpha-beta-KMeans, to address the problem of learning information divergences while clustering SPD matrices in the absence of labeled data.
Masked Faces in Context (MASON) for Masked Face Detection and Classification
(2023-01) Shield, Helena
As the SARS-CoV-2 virus mutated and spread around the world, scientists andpublic health officials were faced with the responsibility of making health recommen- dations as they studied the novel disease in real time. One such recommendation was the use of face masks of varying types as a method of reducing disease spread in public spaces. Evaluating the effectiveness of such measures requires accurate data collection of the proper facemask usage. The use of computer vision models to detect and clas- sify face mask usage can aid in the collection process by monitoring usage in public spaces. However, training these models requires accurate and representative datasets. Pre-COVID-19 datasets and synthetic datasets have limitations that affect the accu- racy of models in real world settings such as inaccurate representations of occlusion and limited variety of subjects, settings, and masks. In this work we present a new dataset Masked Faces in Context (MASON) of annotated real-world images focusing on the time period of 2020 to the present and baseline detection and classification models that outperforms the current state of the art. This dataset better snapshots mask wearing under covid with greater representation of different age groups, mask types, common occlusion items such as face shields, and face position. Our experiments demonstrate increased accuracy in face mask detection and classification.
PermNet: Permuted Convolutional Neural Network
(2021-05) Mehta, Rishabh
Convolution filters in CNNs extract patterns from input by aggregating information across height, width and channel dimensions. Information aggregation across height and width dimensions performed using depthwise convolution, helps identify neighborhood patterns and hence is very intuitive. However the method in which channel dimension information is aggregated by channel summation seems mathematically simplistic and out of convenience. In this project we attempt to improve the channel dimension aggregation operations. The first approach introduces weighted summation channel aggregation in convolutions. The second approach introduces permuted convolutions which attempt to perform psuedo-width scaling by generating new constrained filters from existing filters. Implementing permuted convolutions comes with many challenges such as permutation explosion, stochasticity, higher memory and computation requirements. To resolve these issues, we come up with multiple variants of permuted convolutions and present their advantages and disadvantages. Lastly, we provide empirical results showcasing the performance of weighted channel summation networks and permuted convolution networks, present our findings and recommendations for future work.
Robustness in Deep Learning: Single Image Denoising using Untrained Networks
(2021-05) Singh, Esha
Deep Learning has become one of the cornerstones of today’s AI advancement and research. Deep Learning models are used for achieving state-of-the-art results on a wide variety of tasks, including image restoration problems, specifically image denoising. Despite recent advances in applications of deep neural networks and the presence of a substantial amount of existing research work in the domain of image denoising, this task is still an open challenge. In this thesis work, we aim to summarize the study of image denoising research and its trend over the years, the fallacies, and the brilliance. We first visit the fundamental concepts of image restoration problems, their definition, and some common misconceptions. After that, we attempt to trace back where the study of image denoising began, attempt to categorize the work done till now into three main families with the main focus on the neural network family of methods, and discuss some popular ideas. Consequently, we also trace related concepts of over-parameterization, regularisation, low-rank minimization and discuss recent untrained networks approach for single image denoising, which is fundamental towards understanding why the current state-of-art methods are still not able to provide a generalized approach for stabilized image recovery from multiple perturbations.
Topological Methods for 3D Point Cloud Processing
(2018-08) Beksi, William
3D point cloud datasets are becoming more common due to the availability of low-cost sensors. Light detection and ranging (LIDAR), stereo, structured light, and time-of-flight (ToF) are examples of sensors that capture a 3D representation of the environment. These sensors are increasingly found in mobile devices and machines such as smartphones, tablets, robots, and autonomous vehicles. As hardware technology advances, algorithms and data structures are needed to process the data generated by these sensors in innovative and meaningful ways. This dissertation develops and applies algebraic topological methods for processing 3D point cloud datasets. The area of topological data analysis (TDA) has matured in recent years allowing researchers to analyze point cloud datasets using techniques that take into account the 'shape' of the data. This includes topological features such as connected components, holes, voids, and higher dimensional analogs. These ideas have been successfully applied to datasets which are naturally embedded in a metric space (such as Euclidean space) where distances between points can be used to form a parameterized sequence of spaces. By studying the changing topology of this sequence we gain information about the underlying data. In the first part of the thesis, we present a fast approach to build a 3D Vietoris-Rips complex which allows us to approximate the topology of a point cloud. The construction of the complex is done in three parallelized phases: nearest neighbors search, edge list generation, and triangle list generation. The edge and triangle lists can then be used for persistent homology computations. In the second part of the thesis, we present approaches to segment 3D point cloud data using ideas from persistent homology theory. The proposed algorithms first generate a simplicial complex representation of the point cloud dataset. Then, the zeroth homology group of the complex which corresponds to the number of connected components is computed. Finally, we extract the clusters of each connected component in the dataset. We show that these methods provide a stable segmentation of point cloud data under the presence of noise and poor sampling conditions, thus providing advantages over contemporary segmentation procedures. In the third part of the thesis, we address an open problem in computational topology by introducing a nearly linear time algorithm for incrementally computing topologically persistent 1-cycles. Further, we develop a second algorithm that utilizes the output of the first to generate a spanning tree upon which non-bounding minimal 1-cycles can be computed. These non-bounding minimal 1-cycles are then used to locate and fill holes in a dataset. Experimental results show the efficacy of our algorithms for reconstructing the surface of 3D point clouds produced by noisy sensor data. In the fourth part of the thesis, we develop a global feature descriptor termed Signature of Topologically Persistent Points (STPP) that encodes topological invariants (zeroth and first homology groups) of 3D point cloud data. STPP is a competitive 3D point cloud descriptor when compared to the state of art and is resilient to noisy sensor data. We demonstrate experimentally that STPP can be used as a distinctive signature, thus allowing for 3D point cloud processing tasks such as object detection and classification. This dissertation makes progress towards effective, efficient, and scalable topological methods for 3D point cloud processing along two directions. We present algorithms with an analysis of their theoretical performance and proof of correctness. We also demonstrate the feasibility and applicability of our results with experiments using publicly available datasets.
Towards a Generic Object Detection Algorithm
(2022) Pidaparti, Ashvin S
In the field of Autonomous Robotics, object detection is a key element for a robot to interact with the world. A robot must be aware of its surroundings in order to take the appropriate action for those surroundings. Through extensive research, algorithms to detect objects and determine their location in images have been developed, however, these algorithms require a large amount of high quality images of the objects. These images must be annotated, requiring a human to explicitly state where an object is in an image. For several objects, images are not available, and there are simply too many objects in our world to effectively detect all of them. We put forward a method of detecting objects without a significant number of images, dubbed Zero Shot Detection. This method functions by creating binary feature extractors and using the output of several feature extractors to create a vector representation of an object. This vector representation is then matched against a list of several objects that were not in the original dataset and their expected output from those feature extractors to find the closest match. This match is the classification of that object.
Universal robot for automated microinjection with applications in transgenesis and cryopreservation
(2023-01) Joshi, Amey
Microinjection is the process of injecting a small amount of solution into biological organisms at a microscopic level using a glass micropipette. It is a widely utilized technique with a wide range of applications in both fundamental research and clinical settings. However, microinjection is an extremely laborious and manual procedure, which makes it a critical bottleneck in the field and thus ripe for automation. In this thesis, we introduce a simple computer vision-guided robot that uses off-the-shelf components to fully automate the microinjection procedure in different model organisms. The robot uses machine learning models that have been trained to detect individual embryos on agar plates and serially performs microinjection at a particular site in each detected embryo with no human interaction. We deployed three such robots operated by expert and novice users to perform automated microinjections in zebrafish (Danio rerio) and Drosophila melanogaster. We conducted survivability studies to better understand the impact of microinjection on zebrafish embryos and the fundamental mechanisms by which microinjection affects zebrafish embryos. We were able to use the robot to examine the speed of the micropipette, the volume of the microinjectant, the micropipette geometry, and the rate of the volume delivered. These results helped us in determining the optimum settings for automated microinjection into zebrafish embryos. We used transgenesis studies to compare microinjection efficiency to manual microinjection utilizing optimum settings for automated microinjection. Further, we demonstrated that robotic microinjection of cryoprotective agents in zebrafish embryos significantly improves vitrification rates and post-thaw survivability of cryopreserved embryos compared to manual microinjection, opening the door to large-scale cryo-banking of various aquatic species on an industrial scale. We anticipate that this robotic microinjection can be readily adapted to other organisms and applications.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Computer Vision"