Machine learning and data mining methods for recommender systems and chemical informatics.

Ning, Xia2012-09-252012-09-252012-07https://hdl.handle.net/11299/134509University of Minnesota Ph.D. dissertation. Major: Computer science. Advisor: Dr. George Karypis. 1 computer file (PDF); xiv 194 pages, appendices A-C.This thesis focuses on machine learning and data mining methods for problems arising primarily in recommender systems and chemical informatics. Although these two areas represent dramatically different application domains, many of the underlying problems have common characteristics, which allows the transfer of ideas and methods between them. The first part of this thesis focuses on recommender systems. Recommender systems represent a set of computational methods that produce recommendations of interesting entities (e.g., products) from a large collection of such entities by retrieving/filtering/learning information from their own properties (e.g., product attributes) and/or the interactions with other parties (e.g., user-product ratings). We have addressed the two core tasks for recommender systems, that is, top-N recommendation and rating prediction. We have developed 1). a novel sparse linear method for top-N recommendation, which utilizes regularized linear regression with sparsity constraints to model user-item purchase patterns; 2). a set of novel sparse linear methods with side information for top-N recommendation, which use side information to regularize sparse linear models or use side information to model user-item purchase behaviors; and 3). a multi-task learning method for rating prediction, which uses multi-task learning methodologies to model user communities and predict personalized ratings. The second part of this thesis is dedicated to chemical informatics, which is an interdisciplinary research area where computational and information technologies are developed to aid the investigation of chemical problems. We have developed computational methods to build two important models in chemical informatics, that is, Structure-Activity-Relationship (SAR) model and Structure-Selectivity-Relationship (SSR) model. We have developed 1). a multi-assay-based SAR model, which leverages information from different protein families; and 2). a set of computational methods for better SSR models, which use various learning methodologies including multi-class classification and multi-task learning. The studies on recommender systems and chemical informatics show that these two areas have great analogies in terms of the data, the problem formulations and the underlying principles, and any advances in one area could contribute to that of the other.en-USComputer ScienceMachine learning and data mining methods for recommender systems and chemical informatics.Thesis or Dissertation