Browsing by Subject "classification"

Now showing 1 - 5 of 5

An Efficient, Scalable, Parallel Classifer for Data Mining
(1997) Srivastava, Anurag; Singh, Vineet; Han, Eui-Hong; Kumar, Vipin
Classification is an important data mining problem. Recently, there has been significant interest in classification using training datasets that are large enough that they do not fit in main memory and need to be disk-resident. Although training data can be reduced by sampling, it has been shown that it can be advantageous to use the entire training dataset since that can increase accuracy. Most current algorithms are unsui:table for large disk-resident datasets because their space and time complexities (including I/0) are prohibitive. A recent algorithm called SPRINT promises to alleviate some of the data size restrictions. We present a new algorithm called SPEC that provides similar accuracy, reduces I/0, reduces memory requirements, and improves scalability (time and space) on both sequential and parallel computers. We provide some theoretical results as well as experimental results on the IBM SP2.
Comparing Two Classification Methods of Third Molar Development
(2018-06) Kats, Olga
BACKGROUND: Radiographic evaluation of third molar development is often used in estimating chronological age. A widely used system of such an evaluation, developed by Demirjian, uses eight growth stages (Demirjian et al. 1973). These stages are defined by changes of shape and can be subjective (Sisman et al. 2007). A new staging system uses numeric values (millimeters) to separate the stages (Hammer 2015). OBJECTIVE: The purpose of this study is to determine if Hammer’s staging of third molar development is more reliable than Demirjian’s staging. MATERIALS AND METHODS: Existing panoramic radiographs from University of Minnesota Orthodontic Department were scored twice by three calibrated readers using Hammer and Demirjian Staging Classifications. Kappa statistics were calculated to assess intra- and inter-rater agreement. RESULTS: The results showed that Hammer’s method had higher intra- and inter-rater reliability, but is not significantly different from Demirjian’s method. CONCLUSION: Hammer’s classification of third molar eruption pattern may be used to stage third molar formation. Future studies may aim to correlate Hammer’s classification with population-specific chronological age data.
Hyperdimensional Computing based Classification and Clustering: Applications to Neuropsychiatric Disorders
(2023-12) Ge, Lulu
Since its introduction in 1988, hyperdimensional computing (HDC), also referred to as vector symbolic architecture (VSA), has attracted significant attention. Using hypervectors as unique data points, this brain-inspired computational paradigm represents, transforms, and interprets data effectively. So far, the potential of HDC has been demonstrated: comparable performance to traditional machine learning techniques, high noise immunity, massive parallelism, high energy efficiency, fast learning/inference speed, one/few-shot learning ability, etc. In spite of HDC’s wide range of potential applications, relatively few studies have been conducted to demonstrate its applicability. To this end, this dissertation focuses on the application of HDC to neuropsychiatric disorders: (a) seizure detection and prediction, (b) brain graph classification, and (c) transcranial magnetic stimulation (TMS) treatment analysis. We also develop novel clustering algorithms using HDC that are more robust than the existing HDCluster algorithm. In order to detect and predict seizures, intracranial electroencephalography (iEEG) data are analyzed through the use of HDC-based local binary pattern (LBP) and power spectral density (PSD) encoding. Our study examines the effectiveness of utilizing all features as well as a small number of selected features. Our results indicate that HDC can be used for seizure detection, where PSD encoding is superior to LBP encoding. We observe that even three features are efficient in detecting seizures with PSD encoding. However, in order to pave the way for seizure prediction using HDC, more efficient features must be explored. For the classification of brain graphs, data from functional magnetic resonance imaging (fMRI) are analyzed. Brain graphs describe the functional brain connectome under varying brain states, and are generated from the fMRI data collected at rest and during tasks. The brain graph structure is assumed to vary from task to task and from task to no task. Participants are asked to execute emotional and gambling tasks, but no tasks are assigned during resting periods. GrapHD, an HDC-based graph representation, initially developed for object detection, is herein expanded for the application to brain graph classification. Experimental results demonstrate that GrapHD encoding has the capability of classifying the brain graphs for three binary classification problems: emotion vs. gambling, emotion vs. no-task, and gambling vs. no-task. Furthermore, GrapHD requires fewer memory resources as compared to the extant HDC-based encoding approaches. In terms of clustering, HDCluster, an HDC-based clustering algorithm, has been proposed in 2019. Originally designed to mimic the traditional k-means, HDCluster exhibits higher clustering performance across versatile datasets. However, we have identified that the performance of the HDCluster may be significantly influenced by the random seed used to generate the seed hypervectors. To mitigate the impact of this random seed, we propose more robust HDC-based clustering algorithms, designed to outperform HDCluster. Experimental results substantiate that our HDC-based algorithms are more robust and capable of achieving higher clustering performance than the HDCluster. In the analysis of TMS treatment, we conduct two specific tasks. One is to identify the clinical trajectory patterns for patients who suffer from major depressive disorder (MDD) (Clustering). Another is to predict MDD severity using 34 measured cognitive variables (Classification). For clustering, we propose a novel HDC-based algorithm that manipulates HDCluster to determine the number of clusters for a system of clinical trajectories. For classification, we utilize two HDC-based encoding algorithms and examine the impact of using either all features or selected features. Experimental results indicate that our HDC algorithm mirrors the clustering pattern of the classical algorithm. Additionally, the HDC-based classifier effectively predicts the concept of clinical response.
Imputation of ecological detail using associated forest inventory, plant community and physiographic data
(2016-11) Wilson, David
The desire to consider additional ecological information in management planning has become a pressing concern in the field of forest ecology and management. While intensively managed forest stands provide ecological benefits, these can be different from the services and values supported by native ecosystems and plant communities. To better understand the implications of management for biological diversity, ecosystem services, timber production and other interests, an ecological classification methodology matched with existing forest inventory and management operations is proposed and developed. This methodology makes use of nearly 17,000 native plant community (NPC) observations provided by the Minnesota Department of Natural Resources (MNDNR) and others. These observations cover the period from 1964 – 2015, and coincide with stands monitored by the MNDNR Division of Forestry. The proposed imputation model (Chapter 1) represents an improvement over randomForest based methods in terms of accuracy, coverage, and the ability to consider complex categorical variables with essentially unlimited levels of detail. Extension of the methodology to include United States Department of Agriculture (USDA) Forest Inventory and Analysis (FIA) plot observations and additional predictive characteristics further improves classification results (Chapter 2). The net predictive capability is sufficient to produce estimates of the areal extent of major forested NPCs occurring in Minnesota. These estimates are derived from a process utilizing the spatial overlay of FIA plots with MNDNR stands having NPC observations. These “observed” FIA plots serve as training data to classify the full set of FIA plots observed in Minnesota. Finally, FIA data augmented with imputed NPC classifications are used to assess relationships between NPC classifications and growth and yield characteristics of the forests in each community (Chapter 3). Results indicate that NPC classification often corresponds to meaningful distinctions between different growth patterns and eventual yield of forested stands. Imputation can provide us with timely and accurate knowledge of NPC distribution, abundance, successional state, demographic, and economic relationships. This enhanced understanding of landscape-scale ecological conditions can, in turn, lead to better informed management decisions based on the extrapolation of observed ecological conditions and growth parameters to very similar, nearby management units.
Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks
(2018-01) Beck, Melanie
The large flood of data flowing from observatories presents significant challenges to astronomy and cosmology – challenges that will only be magnified by projects currently under development. Growth in both volume and velocity of astrophysics data is accelerating: whereas the Sloan Digital Sky Survey (SDSS) has produced 60 terabytes of data in the last decade, the upcoming Large Synoptic Survey Telescope (LSST) plans to register 30 terabytes per night starting in the year 2020. Additionally, the Euclid Mission will acquire imaging for ∼ 5 × 10^7 resolvable galaxies. The field of galaxy evolution faces a particularly challenging future as complete understanding often cannot be reached without analysis of detailed morphological galaxy features. Historically, morphological analysis has relied on visual classification by astronomers, accessing the human brains capacity for advanced pattern recognition. However, this accurate but inefficient method falters when confronted with many thousands (or millions) of images. In the SDSS era, efforts to automate morphological classifications of galaxies (e.g., Conselice et al., 2000; Lotz et al., 2004) are reasonably successful and can distinguish between elliptical and disk-dominated galaxies with accuracies of ∼80%. While this is statistically very useful, a key problem with these methods is that they often cannot say which 80% of their samples are accurate. Furthermore, when confronted with the more complex task of identifying key substructure within galaxies, automated classification algorithms begin to fail. The Galaxy Zoo project uses a highly innovative approach to solving the scalability problem of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org asks people to classify images by eye. Within the first year hundreds of thousands of members of the general public had classified each of the ∼1 million SDSS galaxies an average of 40 times. Galaxy Zoo thus solved both the visual classification problem of time efficiency and improved accuracy by producing a distribution of independent classifications for each galaxy. While crowd-sourced galaxy classifications have proven their worth, challenges remain before establishing this method as a critical and standard component of the data processing pipelines for the next generation of surveys. In particular, though innovative, crowd-sourcing techniques do not have the capacity to handle the data volume and rates expected in the next generation of surveys. These algorithms will be delegated to handle the majority of the classification tasks, freeing citizen scientists to contribute their efforts on subtler and more complex assignments. This thesis presents a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme we increase the classification rate nearly 5-fold classifying 226,124 galaxies in 92 days of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7% accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides a factor of 11.4 increase in the classification rate, classifying 210,803 galaxies in just 32 days of GZ2 project time with 93.1% accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

University Digital Conservancy

Browse by Subject

Browsing by Subject "classification"