Toward Automating and Systematizing the Use of Domain Knowledge in Feature Selection
2015-08
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Toward Automating and Systematizing the Use of Domain Knowledge in Feature Selection
Alternative title
Authors
Published Date
2015-08
Publisher
Type
Thesis or Dissertation
Abstract
Constructing prediction models for real-world domains often involves practical complexities that must be addressed to achieve good prediction results. Often, there are too many sources of data (features). Limiting the set of features in the prediction model is essential for good performance, but prediction accuracy may be degraded by the inadvertent removal of relevant features. The problem is even more acute in situations where the number of training instances is limited, as limited sample size and domain complexity are often attributes of real-world problems. This thesis explores the practical challenges of building regression models in large multivariate time-series domains with known relationships between variables. Further, we explore the conventional wisdom related to preparing datasets for model calibration in machine learning, and discuss best practices for learning time-varying concepts from data. The core contribution of this work is a novel wrapper-based feature selection framework called Developer-Guided Feature Selection (DGFS). It systematically incorporates domain knowledge for domains characterized by a large number of observable features. The observable features may be related to each other by logical, temporal, or spatial relationships, some of which are known to the model developer a priori. The approach relies on limited domain-specific knowledge but can replace or improve upon more elaborate domain specific models and on fully automated feature selection for many applications. As a wrapper-based approach, DGFS can augment existing multivariate techniques used in high-dimensional domains to produce improved modeling results particularly in situations where the volume of training data is limited. We demonstrate the viability of our method in several complex domains (natural and synthetic) that have significant temporal aspects and many observable features.
Description
University of Minnesota Ph.D. dissertation. August 2015. Major: Computer Science. Advisor: Maria Gini. 1 computer file (PDF); xi, 185 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Groves, William. (2015). Toward Automating and Systematizing the Use of Domain Knowledge in Feature Selection. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/175444.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.