An Efficient, Scalable, Parallel Classifer for Data Mining
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
An Efficient, Scalable, Parallel Classifer for Data Mining
Published Date
1997
Publisher
Type
Report
Abstract
Classification is an important data mining problem. Recently, there has been significant
interest in classification using training datasets that are large enough that they
do not fit in main memory and need to be disk-resident. Although training data can
be reduced by sampling, it has been shown that it can be advantageous to use the
entire training dataset since that can increase accuracy. Most current algorithms are
unsui:table for large disk-resident datasets because their space and time complexities
(including I/0) are prohibitive. A recent algorithm called SPRINT promises to alleviate
some of the data size restrictions. We present a new algorithm called SPEC that
provides similar accuracy, reduces I/0, reduces memory requirements, and improves
scalability (time and space) on both sequential and parallel computers. We provide
some theoretical results as well as experimental results on the IBM SP2.
Description
Related to
Replaces
License
Series/Report Number
Funding information
A significant part of this work was done while Anurag Srivastava and Vineet Singh were at IBM TJ
Watson Research Center. This work was supported by NSF grant ASC-9634719, Army Research Office
contract DA/DAAH04-95-l-0538, Cray Research Inc Fellowship, and IBM partnership award, the content
of which does not necessarily reflect the policy of the government, and no official endorsement should be
inferred. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute,
Cray Research Inc., and NSF grant CDA-9414015.
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Srivastava, Anurag; Singh, Vineet; Han, Eui-Hong; Kumar, Vipin. (1997). An Efficient, Scalable, Parallel Classifer for Data Mining. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215295.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.