Bera, Sabyasachi2024-01-052024-01-052023-08https://hdl.handle.net/11299/259681University of Minnesota Ph.D. dissertation. August 2023. Major: Statistics. Advisor: Snigdhansu Bhusan Chatterjee. 1 computer file (PDF); xii, 225 pages.Clustering is the task of grouping a dataset so that data in the same group (called acluster) are more similar in some sense to each other than to those in other groups. While diferent notions of clustering exist in literature, it is commonly understood that data which are "close" to each other (geometric proximity) should be in the same cluster and clusters should capture the concentration pattern (high density regions) in the data. In many applications, especially when the data is from a topological manifold, we are required to capture both geometry and density information from the data simultaneously in order to cluster them in a meaningful way. We introduce g-distance, a data driven density sensitive distance, and explore its theoretical properties, geometry and usefulness in clustering applications under several data generating models. We derive the convergence limit of longest leg path distance (LLPD), a purely density based limiting form of g-distance. We compare several distances, for example, Euclidean distance, g-distance, LLPD, in clustering and manifold learning applications under several data generating models. Finally, as an application of high-dimensional learning and manifold learning, we develop a technique for record linkage on high-dimensional data using sparse principal components.enApplied ProbabilityClusteringDensity sensitive distanceManifold HypothesisNetwork based clusteringTopological data analysisInference using Geometry and Density Information in Manifold DataThesis or Dissertation