Browsing by Subject "Density sensitive distance"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Inference using Geometry and Density Information in Manifold Data(2023-08) Bera, SabyasachiClustering is the task of grouping a dataset so that data in the same group (called acluster) are more similar in some sense to each other than to those in other groups. While diferent notions of clustering exist in literature, it is commonly understood that data which are "close" to each other (geometric proximity) should be in the same cluster and clusters should capture the concentration pattern (high density regions) in the data. In many applications, especially when the data is from a topological manifold, we are required to capture both geometry and density information from the data simultaneously in order to cluster them in a meaningful way. We introduce g-distance, a data driven density sensitive distance, and explore its theoretical properties, geometry and usefulness in clustering applications under several data generating models. We derive the convergence limit of longest leg path distance (LLPD), a purely density based limiting form of g-distance. We compare several distances, for example, Euclidean distance, g-distance, LLPD, in clustering and manifold learning applications under several data generating models. Finally, as an application of high-dimensional learning and manifold learning, we develop a technique for record linkage on high-dimensional data using sparse principal components.