Computer Science & Engineering (CS&E) Technical Reports
Persistent link for this collectionhttps://hdl.handle.net/11299/214969
Located in this collection are publications by the department, faculty, students, and visiting members.
Want to submit a Technical Report? See the Technical Report Submission Instructions.
Searching for a technical report
The search bar below supports general keyword search, but can also be used to locate a report by its title or report number. Use quotation marks around a phrase or title to return reports that use that exact phrase. To search for a specific report number, use the following search string: "Technical Report; [Report Number]" (e.g. "Technical Report; 19-001"). You must include the quotations when searching for a report number.
Browse
Browsing Computer Science & Engineering (CS&E) Technical Reports by Issue Date
Now showing 1 - 20 of 749
- Results Per Page
- Sort Options
Item Statement Re-ordering for DOACROSS Loops(1997) Chen, Ding-Kai; Yew, Pen-ChungIn this paper, we propose a new statement re-ordering algorithm for DOACROSS loops that overcomes some of the problems in the previous schemes. The new algorithm uses a hierarchical approach to locate strongly dependent statement groups and to order these groups considering critical dependences. A new optimization problem, dependence covering maximization, which was not discussed before is also introduced. It is shown that this optimization problem is NP-complete, and a heuristic algorithm is incorporated in our algorithm. Run-time complexity analysis is given for both algorithms. This new statement re-ordering scheme, combined with the dependence covering maximization, can be an important compiler optimization to parallelize loop structures for large scale coarse and fine grain parallelism.Item Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes(1997) Han, Eui-Hong; Karypis, George; Kumar, VipinItem Grouping Web Page References Into Transactions for Mining World Wide Web Browsing Patterns(1997) Cooley, Robert; Mobasher, Bamshad; Srivastava, JaideepItem The Design of a Parallel Method for the Solution of a General, Narrow Banded Linear Systems(1997) Oettli, Michael; Sameh, AhmedA row partitioning approach for general, nonsingular narrow-banded linear systems is described which yields a stable and naturally parallel algorithm suitable for parallel computers. The system matrix is partitioned into a series of rectangular smaller matrices with only a small set of global unknowns which interface consecutive submatrices. The solution of a system for this small set of unknowns yields the solution of the original gystem with almost perfect parallelism.Item Superthreading: Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading(1997) Tsai, Jenn-Yuan; Jiang, Zhenzhen; Li, Zhiyuan; Lilja, David; Wang, Xin; Yew, Pen-Chung; Zheng, Bixia; Glamm, RobertAs the number of transistors that can be integrated on a single chip continues to grow, it is important for computer architects to think beyond the traditional approaches of deeper pipelines and wider instruction issue units for improving performance. This single-threaded execution model limits these approaches to exploiting only the relatively small amount of instruction-level parallelism available in application programs. While integrating an entire multiprocessor onto a single chip is feasible, this architecture is limited to exploiting only relatively coarse-grained heavy-weight parallelism. We propose the superthreaded architecture as an excellent alternative for utilizing the large number of transistors that will become available on a single high-density chip. As a hybrid of a wideissue superscalar processor and a multiprocessor-on-a-chip, this new concurrent multithreading architecture can leverage the best of existing and future parallel hardware and software technologies. By incorporating speculation for control dependences and run-time checking of data dependences, the superthreaded architecture can exploit the multiple granularities of parallelism available in general-purpose application programs to reduce the execution time of a single program.Item Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm(1997) Chi, Ed Huai-hsin; Shoop, Elizabeth; Carlis, John; Retzel, Ernest; Riedl, JohnMolecular biologists who conduct large-scale genetic sequencing projects are producing an ever-increasing amount of sequence data. GenBank, the primary repository for DNA sequence data, is doubling in size every 1.3 years. Keeping pace with the analysis of these data is a difficult task. One of the most successful technique, for analyzing genetic data is sequence similarity analysis-the comparison of unknown sequences against known sequences kept in databases. As biologists gather more sequence data, sequence similarity algorithms are more and more useful, but take longer and longer to run. BLAST is one of the most popular sequence similarity algorithms in me today, but its running time is approximately proportional to the size of the database. Sequence similarity analysis using BLAST is becoming a bottleneck in genetic sequence analysis. This paper analyzes the performance of BLAST on SMPs, to improve our theoretical and practical understanding of the scalability of the algorithm. Since the database sizes are growing faster than the improvements in processor speed we expect from Moore's law, multiprocessor architectures appear to be the only way to meet the need for performance.Item System Support for Mobile Agents(1997) Karnik, Neeran M.; Tripathi, Anand R.Mobile objects can be used as a basis for supporting the mobile agents paradigm, which provides an interesting new approach for network-centric programming. We examine the language-level features needed for such applications, and describe a Java object-based mobile agent architecture that can provide system-level support for those features. Security is usually a major concern with mobile agent systems. We discuss security related issues such as authentication of mobile agents, protection of resources and privacy of communications, in the context of our architecture.Item Role of Message-Passing in Performance Oriented Parallel Programming(1997) Kumar, Vipin; Karypis, George; Grama, AnanthThe message-passing programming paradigm is often called the assembly language of parallel computers. The paradigm requires the user to partition the data among processors and specify interaction among processors explicitly. In contrast, the shared memory programming paradigm offers the user a global address space, making it unnecessary to specify data partitioning and communication. Indeed, many parallel programs written using the shared-memory paradigm are much shorter than their counterparts in the message-passing paradigm. Until recently, cache-coherent shared address space architectures were largely based on a global shared bus, making them inherently unsealable beyond a few dozen processors. More recently, scalable directory-based cache-coherent shared address space architecture are being offered by many vendors. This seems to make the message-passing programming style obsolete. In this article, we argue that the style of message-passing programming makes it much easier to develop efficient parallel programs even on cache-coherent shared address space architectures.Item Enhancing Multiple-Path Speculative Execution with Predicate Window Shifting(1997) Tsai, Jenn-Yuan; Yew, Pen-ChungSpeculative execution has long been used as an approach to exploit instruction level parallelism across basic block boundaries. Most existing speculative execution techniques only support speculating along single control path, and heavily rely on branch prediction to choose the right control path. In this paper, we propose an extended predicated execution mechanism, called predicate shifting, to support speculating along multiple control paths. The predicate shifting mechanism maintains a condition/predicate window for each basic block. With the condition/predicate window, instructions ca.n be guarded by predicates related to current or future branch conditions. The predicate shifting mechanism can reduce the number of required tag bits by shifting conditions/predicates out of the condition/predicate window whenever they are no longer in use. To incorporate the predicate shifting mechanism into a VLIW processor, a new result-buffering structure, call future buffer, is used to buffer uncommitted results and to evaluate predicates. The FIFO structure of the future buffer not only simplifies exception handling but also allows multiple uncommitted writes to the same register. Experimental results show that the predicate shifting mechanism can use predicate tag effectively and achieve 24% performance improvement over the previous predicating mechanism [2] using a small predicate tag.Item Robust Preconditioning for Spare Linear Systems(1997) Chow, EdmondPreconditioned iterative methods have become standard linear solvers in many applications, but their limited robustness in some cases has hindered the ability to efficiently solve very large problems in some areas. This thesis proposes several new preconditioning techniques that attempt to extend the range of iterative methods, particularly to solving nonsymmetric and indefinite problems such as those arising from incompressible computational fluid dynamics. First, we present an iterative technique to compute sparse approximate inverse preconditioners. This new technique produces approximate inverses comparable in quality with others in the literature, but at a lower computational cost, and with a simpler method to determine good sparsity patterns for the approximate inverse. This class of preconditioners is very attractive for parallel computing and avoids the stability problems that may occur with incomplete LU (ILU) preconditioners on nonsymmetric or indefinite matrices. Next, we demonstrate more effective uses of approximate inverses than using them to precondition the entire matrix. We show how to use approximate inverse techniques to construct block preconditioners, particularly for the discrete fully-coupled incompressible Navier-Stokes equations and matrices that are partitioned by nonoverlapping domain decomposition. Approximate inverse techniques are used to generate sparse approximate solutions whenever these are needed in forming the preconditioner. The storage requirements for these preconditioners are much less than for ILU preconditioners for many tough, large-scale problems. To try to improve ILU preconditioners, we experimentally analyze their causes of failure, particularly on indefinite linear systems. We categorize the failure modes and design statistics that can be routinely computed to monitor the factorization or determine the cause of failure. Through this better understanding, we show how these causes of failure can sometimes be circumvented through pivoting, reordering, scaling, perturbing diagonal elements, and preserving symmetric structure. Finally, we present an object-oriented framework that implements some of the preconditioning algorithms above. The framework allows different data structures for block matrices to be used with the same preconditioners. Various methods are provided for inverting diagonal or pivot blocks, possibly with an approximate inverse technique.Item On Some Geometric Optimization Problems in Layered Manufacturing(1997) Majhi, J.; Janardan, R.; Smid, M.; Gupta, P.Efficient geometric algorithms are given for optimization problems arising in layered manufacturing, where a 3D object is built by slicing its CAD model into layers and manufacturing the layers successively. The problems considered include minimi7,ing the degree of stair-stepping on the surfaces of the manufactured object, minimizing the volume of the so-called support structures used, and minimizing the contact area between the supports and the manufactured object- all of which are factors that affect the speed and accuracy of the process. The stair-step minimization algorithm is valid for any polyhedron, while the support minimization algorithms are applicable t.o convex polyhedra only. Algorithms arc a1so given for optimizing supports for non-convex, simple polygons. The techniques used to obtain these results include construction and searching of certain arrangements on the sphere, 3D convex hulls, balfplanc range searching, ray-shooting, visibility, and constrained optimization.Item Creating a Virtual Network Laboratory(1997) Lee, Yen-Jen; Ma, Wei-hsiu; Du, David H.C.; Schnepf, James A.Networking technologies have entered an unprecedented era after the explosive growth of the Internet and the roll-out of high speed networks. This paper addresses the concept of using existing multimedia and computer networking technologies to create a remotely accessible, virtual network laboratory that can expand student access and eliminate many of the time, geographical, and cost constraints that currently exist. The authors propose a framework for constructing lab modules for a virtual network laboratory. A prototype has been developed for a series of Java-based modules that allow students to access and interact with the virtual laboratory databases and physical networking devices in a user-friendly manner. It provides a demonstration of networking concepts by using the developed materials in new courses at each of the participating universities.Item Principal Direction Divisive Partitioning(1997) Boley, DanielWe propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e. in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The splits are not based on any distance or similarity measure. The documents are assembled in to a matrix which is very sparse. It is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.Item Parallel Multilevel Diffusion Algorithms for Repartitioning of Adaptive Meshes(1997) Schloegel, Kirk; Karypis, George; Kumar, VipinGraph partitioning has been shown to be an effective way to divide a large computation over an arbitrary number of processors. A good partitioning can ensure load balance and minimize the communication overhead of the computation by partitioning an irregular mesh into p equal parts while minimizing the number of edges cut by the partition. For a large class of irregular mesh applications, the structure of the graph changes from one phase of the computation to the next. Eventually, as the graph evolves, the adapted mesh has to be repartitioned to ensure good load balance. Failure to do so will lead to higher parallel run time. This repartitioning needs to maintain a low edge-cut in order to minimize communication overhead in the follow-on computation. It also needs to minimize the time for physically migrating data from one processor to another since this time can dominate overall run time. Finally, it must be fast and scalable since it may be necessary to repartition frequently. Partitioning the adapted mesh again from scratch with an existing graph partitioner can be done quickly and will result in a low edge-cut. However, it will lead to an excessive migration of data among processors. In this paper, we present new parallel algorithms for robustly computing repartitionings of adaptively refined meshes. These algorithms perform diffusion of vertices in a multilevel framework and minimize data movement without compromising the edge-cut. Furthermore, our parallel repartitioners include parameterized heuristics to specifically optimize edge--cut, total data migration, or the maximum amount of data migrated into and out of any one processor. Our results on a variety of synthetic meshes show that our parallel multilevel diffusion algorithms are highly robust schemes for repartitioning adaptive meshes. The resulting edge-cuts are close to those resulting from partitioning from scratch with a state-of-the-art graph partitioner, while data migration is substantially reduced. Furthermore, repartitioning can be done very fast. Our experiments show that meshes with around eight million vertices can be repartitioned on a 256-processor Cray T3D in only a couple of seconds.Item Spatial Databases: Accomplishments and Research Needs(1997) Shekhar, S.; Ravada, S.; Fetterer, A.; Liu, X.; Lu, C.T.Spatial databases have been an active area of research for over two decades, addressing the growing data management and analysis needs of spatial applications such as Geographic Information Systems. This research has produced a taxonomy of models for space, spatial data types and operators, spatial query languages and processing strategies, as well as spatial indexes and clustering techniques. However, more research is needed to improve support for network and field data, as well as query processing ( e.g. cost models, bulk load). Another important need is to apply the spatial data management accomplishments to newer applications such as data warehouses and multimedia information systems. The objective of this paper is to identify recent accomplishments and the research needs in the near term.Item Performance Study of Superthreaded Architecture: A Thesis(1997) Jiang, ZhenzhenThis paper presents the simulation studies done on the Superthreaded Architecture. Two trace-driven, cycle-by-cycle Superthreaded processor simulators are implemented for the study. One issues single instruction per thread in each clock cycle, (called SIPT Superthreaded Simulator in this paper), the other has Superscalar features incorporated into it and issues multiple instructions per thread in each clock cycle (called MIPT Superthreaded Simulator in this paper). Both allow run-time datadependence checking and instruction scheduling. The simulation results show that the Superthreaded architecture, which adopts a thread-pipelining execution model and allows threads with data dependencies and control dependencies to be executed in parallel, can handle loops with run-time control speculation very well, and can achieve good speedups for most of the SPEC benchmark programs. What's more, some design features of the existing single-threaded multiple-issue processor, such as Superscalar, can be added onto the Superthreaded architecture to further exploit instruction-level parallelism.Item High-Speed Network Support for High-Performance Network Computing and Multimedia Communications(1997) Hsieh, JenweiThe prevalence of computer networks has shifted the computing paradigm from mainframe or host-centric computing to network-centric computing. In networkcentric computing, applications are executed distributedly on a collection of computers interconnected via local and wide area networks. The performance of networkcentric applications can be dramatically improved with switch-based high-speed networks, such as HIPPI, ATM, and Fibre Channel. In this study, we focus on the high-speed network support for two important applications in network-centric computing: high-performance network computing and multimedia communication. One important class of network computing is cluster computing, which enables a collection of locally interconnected computers to be used as a general-purpose parallel computing system. Large problems can be solved cost effectively by using the aggregate processing power and memory space of a cluster. However, communication between processors has long been the bottleneck of cluster computing. We have especially concentrated on maximizing the achievable throughput and minimizing the communication delay for cluster computing in homogeneous environments. We have enhanced a popular cluster computing environment, Parallel Virtual Machine (PVM) with clusters of workstations on either local ATM or HIPPI networks. One possible extension of cluster computing is to incorporate clusters of computers via wide area networks. This is called meta-computing. For example, a group of diverse high-performance computers from several geographically distributed supercomputer centers can be employed to solve large problems. ATM is the de facto standard for wide area networks. However, most of the supercomputer centers use HIPPI in their computing facilities. The internetworking of HIP PI networks and wide area ATM networks becomes an important issue for met a-computing. Two feasible solutions for the problem, HIPP! Tunneling and IP Routing, have been studied in this thesis. Multimedia communication imposes another challenge for high speed networks. The delivery of continuous media requires high communication bandwidth and realtime constraint. Ve have studied two new CBR transmission schemes, called PCRassist CBR (PCBR) and PCR-assist Dual-Rate CBR (PDCBR), which employ the Program Clock References (PCR) embedded in the MPEG-2 Transport Streams to regulate their transmission. The two schemes provide flexible trade-off between buffer requirement and transmission rates.Item Reengineering Safety Critical Systems in an Industrial Environment(1997) Mojdehbakhsh, R.; Persen, K.; Poonawala, M.; Subramanian, S.This paper discusses the development of a framework for safety-critical medical devices in an industrial environment. The authors have worked on the development and testing of a cardiac rhythm management system at Guidant Corporation, which is involved in the development of a family of related medical devices such as Pacemakers and Defibrillators. The development and testing process for these systems is expensive because of the stringent safety and reliability requirements of these devices. To leverage the cost involved in this process we take advantage of the overlap in functionality across a family of products. In this paper, we present a domain-specific framework for developing and maintaining these safety-critical software systems. The approach allows easy generation and maintenance of lifecycle artifacts, like code, test requirements, while maximizing reusability. We have demonstrated our technique in the testing of a cardiac pacemaker and have achieved significant improvements in productivity.Item An Efficient, Scalable, Parallel Classifer for Data Mining(1997) Srivastava, Anurag; Singh, Vineet; Han, Eui-Hong; Kumar, VipinClassification is an important data mining problem. Recently, there has been significant interest in classification using training datasets that are large enough that they do not fit in main memory and need to be disk-resident. Although training data can be reduced by sampling, it has been shown that it can be advantageous to use the entire training dataset since that can increase accuracy. Most current algorithms are unsui:table for large disk-resident datasets because their space and time complexities (including I/0) are prohibitive. A recent algorithm called SPRINT promises to alleviate some of the data size restrictions. We present a new algorithm called SPEC that provides similar accuracy, reduces I/0, reduces memory requirements, and improves scalability (time and space) on both sequential and parallel computers. We provide some theoretical results as well as experimental results on the IBM SP2.Item A Programming Environment for Multimedia Applications(1997) Wadhwa, SunilThis document is a report of the Program Development Tool (PDT), a tool designed and developed as part of the multimedia programming system by the Multimedia Group at the University of Minnesota, Twin Cities. This report describes the overall project environment for which the toolkit is built, specific requirements of the tool, design and implementation issues. It also proposes avenues for further research. This report is accompanied by two appendices - the User Manual and the Design Document. The user manual is intended to serve as a guide for the novice user. It is also a handy reference of the various procedures involved in working with the tool. The design document describes in detail the structure and modules in the source code. It may be used to update the program code if necessary. The remainder of this section gives an overview of the multimedia project and focuses on the aspects specific to the Program Development Tool. Section 2 discusses the need for such a toolkit. Section 3 describes the design issues that have been tackled by the group. The implementation issues are discussed in Section 4. There are many areas in which future research is possible. These are described in Section 5 for those who would like to continue work on the toolkit. Finally, Section 6 concludes the report by highlighting the key aspects of the project and the lessons learned.