Parallel Algorithms in Data Mining

Recent times have seen an explosive growth in the availability ofvarious kinds of data. It has resulted in an unprecedented opportunity todevelop automated data-driven techniques of extracting useful knowledge.Data mining, an important step in this process of knowledge discovery,consists of methods that discover interesting, non-trivial, and usefulpatterns hidden in the data.To date, the primary driving force behind the research in data mininghas been the development of algorithms for data-sets arising in variousbusiness, information retrieval, and financial applications.Due to the latest technological advances,very large data-sets are becoming available in many scientificdisciplines as well. The rate of production of such data-sets far outstripsthe ability to analyze them manually.Data mining techniques hold great promises for developing new sets of toolsthat can be used to automatically analyze the massive data-sets resultingfrom such simulations, and thushelp engineers and scientists unravel the causal relationships in theunderlying mechanisms of the dynamic physical processes.The huge size of the available data-sets and their high-dimensionalitymake large-scale data mining applications computationally very demanding,to an extent that high-performance parallel computing is fast becomingan essential component of the solution.Moreover, the quality of the data mining results often depends directlyon the amount of computing resources available.In fact, data mining applications are poised to become the dominant consumersof supercomputing in the near future. There is a necessity to developeffective parallel algorithms for various data mining techniques.However, designing such algorithms is challenging.In this paper, we will describe the parallel formulations of twoimportant data mining algorithms: discovery of association rules, andinduction of decision trees for classification.

Collections

Computer Science & Engineering (CS&E) Technical Reports

Series/Report Number

Technical Report; 01-001

Suggested citation

Joshi, Mahesh; Han, Euihong; Karypis, George; Kumar, Vipin. (2001). Parallel Algorithms in Data Mining. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215466.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Parallel Algorithms in Data Mining

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Parallel Algorithms in Data Mining

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation