Browsing by Subject "Principal Component Analysis"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Clustering in a High-Dimensional Space Using Hypergraph Models(1997) Han, Eui-Hong; Karypis, George; Kumar, Vipin; Mobasher, BamshadClustering of data in a large dimension space is of a great interest in many data mining applications. Most of the traditional algorithms such as K-means or AutoCJass fail to produce meaningful clusters in such data sets even when they are used with well known dimensionality reduction techniques such as Principal Component Analysis and Latent Semantic Indexing. In this paper, we propose a method for clustering of data in a high dimensional space based on a hypergraph model. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge ;epresents a relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this affinity. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. We present results of experiments on three different data sets: S&PSOO stock data for the period of 1994-1996, protein coding data, and Web document data. Wherever aplicable, we compared our results with those of AutoClass and K-means clustering algorithm on original data as well as on the reduced dimensionality data obtained via Principal Component Analysis or Latent Semantic Indexing scheme. These experiments demonstrate that our approach is applicable and effective in a wide range of domains. More specifically, our approach performed much better than traditional schemes [or high dimensional data sets in terms of quality of clusters and runtime. Our approach was also able to filter out noise data from the clusters very effectively without compromising the quaJity of the clusters.Item Development and Testing of Decision Support Tools in Gait Analysis(2016-04) Rozumalski, AdamObjectives Clinical gait analysis, as commonly prescribed for children with Cerebral Palsy, is a complex set of procedures which include examining data from several sources. The tools developed with this project will use that data to provide robust, repeatable, evidence-based guidance to highlight the most effective treatments for children with CP. These tools will also supply objective measures that can be included in outcome analysis. Methods Several mathematical techniques are used to find patterns within the gait date including: singular value decomposition of kinematic and kinetic data to measure gait pathology; k-means cluster analysis of those results to find recurring patterns; principal components analysis of physical exam findings to relate the gait patterns to physical function; and non-negative matrix factorization of electromyography data to measure motor control. Results The decomposition and scaling of the kinematic and kinetic data resulted in a set of indexes that are able to quantify gait pathology. The k-means cluster analysis reveals that there are repeatable patterns within the gait pathology. These patterns are related to clinical findings as calculated from principal components analysis. Clinical interpretations of motor control can be quantified as muscle synergies using non-negative matrix factorization. Interpretation These tools have proven to provide important quantitative information on treatment outcomes. When implemented in routine clinical gait analysis, these tools have the ability to provide evidence based guidance in treatment decisions.Item An Inferential Perspective on Data Depth(2017-05) Majumdar, SubhabrataData depth provides a plausible extension of robust univariate quantities like ranks, order statistics and quantiles in multivariate setup. Although depth has gained visibility and has seen many applications in recent years, especially in classification problems for multivariate and functional data, its generalizability and utility in achieving traditional parametric inferential goals is largely unexplored. In this thesis we develop several approaches to address this. In particular, firstly we define an evaluation map function that is more general than data depth, and establish several results in a parametric modelling context using a broad definition of a statistical model. A fast algorithm for covariate selection using data depths as evaluation functions arises as a special case of this. We demonstrate applications of this framework on data from diverse fields: namely climate science, medical imaging and behavioral genetics. Secondly we propose a multivariate rank transformation using data depth and use them for robust inference in location and scale problems in elliptical distributions. Thirdly, we lay out a depth-based regularization framework in multi-response regression, and derive a new method of nonconvex penalized sparse regression in the multitask situation. Across the thesis, several simulation studies and real data examples demonstrate the effectiveness of the methods developed here.