Browsing by Subject "parallel processing"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item An Efficient, Scalable, Parallel Classifer for Data Mining(1997) Srivastava, Anurag; Singh, Vineet; Han, Eui-Hong; Kumar, VipinClassification is an important data mining problem. Recently, there has been significant interest in classification using training datasets that are large enough that they do not fit in main memory and need to be disk-resident. Although training data can be reduced by sampling, it has been shown that it can be advantageous to use the entire training dataset since that can increase accuracy. Most current algorithms are unsui:table for large disk-resident datasets because their space and time complexities (including I/0) are prohibitive. A recent algorithm called SPRINT promises to alleviate some of the data size restrictions. We present a new algorithm called SPEC that provides similar accuracy, reduces I/0, reduces memory requirements, and improves scalability (time and space) on both sequential and parallel computers. We provide some theoretical results as well as experimental results on the IBM SP2.Item Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm(1997) Chi, Ed Huai-hsin; Shoop, Elizabeth; Carlis, John; Retzel, Ernest; Riedl, JohnMolecular biologists who conduct large-scale genetic sequencing projects are producing an ever-increasing amount of sequence data. GenBank, the primary repository for DNA sequence data, is doubling in size every 1.3 years. Keeping pace with the analysis of these data is a difficult task. One of the most successful technique, for analyzing genetic data is sequence similarity analysis-the comparison of unknown sequences against known sequences kept in databases. As biologists gather more sequence data, sequence similarity algorithms are more and more useful, but take longer and longer to run. BLAST is one of the most popular sequence similarity algorithms in me today, but its running time is approximately proportional to the size of the database. Sequence similarity analysis using BLAST is becoming a bottleneck in genetic sequence analysis. This paper analyzes the performance of BLAST on SMPs, to improve our theoretical and practical understanding of the scalability of the algorithm. Since the database sizes are growing faster than the improvements in processor speed we expect from Moore's law, multiprocessor architectures appear to be the only way to meet the need for performance.