Browsing by Author "Dhar, Sauptik"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Analysis and extensions of Universum learning(2014-01) Dhar, SauptikMany applications of machine learning involve sparse high-dimensional data, where the number of input features is larger than (or comparable to) the number of data samples. Predictive modeling of such data sets is very ill-posed and prone to overfitting. Standard inductive learning methods may not be sufficient for sparse high-dimensional data, and this provides motivation for non-standard learning settings. This thesis investigates such a new learning methodology called Learning through Contradictions or Universum Learning proposed by Vapnik (1998, 2006) for binary classification. This method incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. However, such a new methodology is still not well-understood and represents a challenge to end users. An overall goal of this thesis is to improve understanding of this new Universum learning methodology and to improve its usability for general users. Specific objectives of this thesis include:Development of practical conditions for the effectiveness of Universum Learning for binary classification.Extension of Universum Learning to real life classification settings with different misclassification costs and unbalanced data.Extension of Universum Learning to single-class learning problems.Extension of Universum Learning to regression problems.The outcome of this research will result in better understanding and adoption of the Universum Learning methods for classification, single class learning and regression problems, common in many real life applications.Item Statistical Analysis of the Soil Chemical Survey Data(Minnesota Department of Transportation Research Services Section, 2010-06) Dhar, Sauptik; Cherkassky, VladimirThis report describes data-analytic modeling of the Minnesota soil chemical data produced by the 2001 metro soil survey and by the 2003 state-wide survey. The chemical composition of the soil is characterized by the concentration of many metal and non-metal constituents, resulting in high-dimensional data. This high dimensionality and possible unknown (nonlinear) correlations in the data make it difficult to analyze and interpret using standard statistical techniques. This project applies a machine learning technique, called Self Organizing Map (SOM), to present the high-dimensional soil data in a 2D format suitable for human understanding and interpretation. This SOM representation enables analysis of the soil chemical concentration trends within the metro area and in the state of Minnesota. These trends are important for various Minnesota regulatory agencies concerned with the concentration of polluting chemical elements due to both (a) human activities, i.e., different industrial land usage, and (b) natural geological factors, such as the geomorphic codes and provenance of glacial sediments.