Department of Computer Science and Engineering

Persistent link for this community

https://hdl.handle.net/11299/93252

Browse

Now showing 1 - 20 of 915

3D Reconstruction of Periodic Motion from a Single View
(2009-09-24) Ribnick, Evan
Periodicity has been recognized as an important cue for tasks like activity recognition and gait analysis. However, most existing techniques analyze periodic motions only in image coordinates, making them very dependent on the viewing angle. In this paper we show that it is possible to reconstruct a periodic trajectory in 3{D} given only its appearance in image coordinates from a single camera view. We draw a strong analogy between this problem and that of reconstructing an object from multiple views, which allows us to rely on well-known theoretical results from the multi-view geometry domain and obtain significant guarantees regarding the solvability of the estimation problem. We present two different formulations of the problem, along with techniques for performing the reconstruction in both cases, and an algorithm for estimating the period of motion from its image-coordinate trajectory. Experimental results demonstrate the feasibility of the proposed techniques.
A Balanced Term-Weighting Scheme for Effective Document Matching
(2001-02-07) Jung, Yunjae; Park, Haesun; Du, Ding-Zhu
A new weighting scheme for vector space model is presented to improve retrieval performance for an information retrieval system. In addition, a dimension compression method is introduced to reduce the computational cost of the weighting approach. The main idea of this approach is to consider not only occurrence terms but also absent terms in finding similarity patterns among document and query vectors. With a basic information retrieval development system which we are now developing, we evaluate the effect of the balanced weighting scheme and compare it with various combinations of weighting schemes in terms of retrieval performance. The experimental results show that the proposed scheme produces similar recall-precision results to the cosine measure, but more importantly enhances retrieval effectiveness. Since the scheme is based on the cosine measure, it is certain that it has insensitivity to weight variance. The results have convincingly illustrated that the new approach is effective and applicable.
A Baseline Method For Search-Based Software Engineering
(2010) Gay, Gregory
Background: Search-based Software Engineering (SBSE) uses a variety of techniques such as evolutionary algorithms or meta-heuristic searches but lacks a standard baseline method. Aims: The KEYS2 algorithm meets the criteria of a baseline. It is fast, stable, easy to understand, and presents results that are competitive with standard techniques. Method: KEYS2 operates on the theory that a small sub-set of variables control the majority of the search space. It uses a greedy search and a Bayesian ranking heuristic to fix the values of these variables, which rapidly forces the search towards stable high-scoring areas. Results: KEYS2 is faster than standard techniques, presents competitive results (assessed with a rank-sum test), and offers stable solutions. Conclusions: KEYS2 is a valid candidate to serve as a baseline technique for SBSE research.
A Case for Requirements Validation
(2004) Heimdahl, Mats
In software engineering we make a distinction between the validation and the verifi- cation of a software system under development. Verification is concerned with demonstrating that the software implements the functional and non-functional requirements. Verification answers the question "is this implementation correct with respect to its requirements?" Validation, on the other hand, is concerned with determining if the functional and non-functional requirements are the right requirements. Validation answers the question "will this system, if build correctly, be safe and effective for its intended use?" There is ample evidence that most safety problems can be traced to erroneous and inadequate requirements. Therefore, to improve the safety of software intensive systems it is critical that the requirements are properly validated. Unfortunately, current certication standards, for example, DO-178B, focus almost exclusively on various verication activities; consequently, most industry practices are geared towards verification activities such as extensive testing and code inspections. Thus, one of the most critical problems with current certification standards is a lack of robust and reliable ways of assessing whether the requirements have been adequately validated.
A Cautionary Note on Decadal Sea Level Pressure Predictions from GCMs
(2018-02-06) Liess, Stefan; Snyder, Peter K.; Kumar, Arjun; Kumar, Vipin
Decadal prediction of sea level pressure (SLP) plays an important role in regional climate prediction, because it shows changes in atmospheric behavior on time scales that are relevant for policy makers. These changes consist of a combination of externally forced and internally driven climate system characteristics. A comparison of SLP trends in a subset of seven Coupled Model Intercomparison Project (CMIP) phase 5 general circulation models (GCM) during the satellite-era to their CMIP3 counterparts reveals an unrealistically strong forecast skill in CMIP3 models for trend predictions for 2001-2011 when using the 1979-2000 period to train the forecast. Boreal-winter SLP trends over five high-, mid-, and low-latitude zones were calculated over a two-decade initialization period for each ensemble member and then ranked based on their performance relative to observations in all five zones over the same time period. The same method is used to rank the ensemble members during the following decade. In CMIP3, 17 out of 38 ensemble members retain their rank in the 2001-2011 hindcast period and 3 retain the neighboring rank. However, these numbers are much lower in more recent CMIP5 decadal predictions over a similar period with the same number of ensembles. The conclusion to consider the forecast skill in CMIP3 predictions during the 2001-2011 as unrealistic is corroborated by comparisons to earlier periods from the 1960s to the 1980s in both CMIP3 and CMIP5 simulations. Thus, although the 2001-2011 CMIP3 predictions show statistically significant forecast skill, this skill should be treated as a spurious result that is unlikely to be reproduced by newer more accurate GCMs.
A Code Transformation Framework for Scientific Applications on Structured Grids
(2011-09-29) Lin, Pei-Hung; Jayaraj, Jagan; Woodward, Paul; Yew, Pen-Chung
The combination of expert-tuned code expression and aggressive compiler optimizations is known to deliver the best achievable performance for modern multicore processors. The development and maintenance of these optimized code expressions is never trivial. Tedious and error-prone processes greatly decrease the code developer's willingness to adopt manually-tuned optimizations. In this paper, we describe a pre-compilation framework that will take a code expression with a much higher programmability and transform it into an optimized expression that would be much more difficult to produce manually. The user-directed, source-to-source transformations we implement rely heavily on the knowledge of the domain expert. The transformed output, in its optimized format, together with the optimizations provided by an available compiler is intended to deliver exceptionally high performance. Three computational fluid dynamics (CFD) applications are chosen to exemplify our strategy. The performance results show an average of 7.3x speedup over straightforward compilation of the input code expression to our pre-compilation framework. Performance of 7.1 Gflops/s/core on the Intel Nehalem processor, 30% of the peak performance, is achieved using our strategy. The framework is seen to be successful for the CFD domain, and we expect that this approach can be extended to cover more scientific computation domains.
A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data
(2008-07-07) Chandola, Varun; Mithal, Varun; Kumar, Vipin
Anomaly detection has traditionally dealt with record or transaction type data sets. But in many real domains, data naturally occurs as sequences, and therefore the desire of studying anomaly detection techniques in sequential data sets. The problem of detecting anomalies in sequence data sets is related to but different from the traditional anomaly detection problem, because the nature of data and anomalies are different than those found in record data sets. While there are many surveys and comparative evaluations for traditional anomaly detection, similar studies are not done for sequence anomaly detection. We investigate a broad spectrum of anomaly detection techniques for symbolic sequences, proposed in diverse application domains. Our hypothesis is that symbolic sequences from different domains have distinct characteristics in terms of the nature of sequences as well as the nature of anomalies which makes it important to investigate how different techniques behave for different types of sequence data. Such a study is critical to understand the relative strengths and weaknesses of different techniques. Our paper is one such attempt where we have comparatively evaluated 7 anomaly detection techniques on 10 public data sets, collected from three diverse application domains. To gain further understanding in the performance of the techniques, we present a novel way to generate sequence data with desired characteristics. The results on the artificially generated data sets help us in experimentally verifying our hypothesis regarding different techniques.
A Comparative Study on Web Prefetching
(2001-05-31) Bhushan Pandey, Ajay; Vatsavai, Ranga R.; Ma, Xiaobin; Srivastava, Jaideep; Shekhar, Shashi
The growth of the World Wide Web has emphasized the need for improved user latency. Increasing use of dynamic pages, frequent changes in the site structure, and user access patterns on the internet have limited the efficacy of caching techniques and emphasized the need for prefetching. Since prefecthing increses bandwidth, it is important that the prediction model is highly accurate and computationally feasible. It has been observed that in a web environment, certain sets of pages exhibit stronger correlations than others, a fact which can be used to predict future requests. Previous studies on predictive models are mainly based on pair interactions of pages and TOP-N approaches. In this paper we study a model based on page interactions of higher order where we exploit set relationships among the pages of a web site. We also compare the performance of this approach with the models based on pairwise interaction and the TOP-N approach. We have conducted a comparative study of these models on a real server log and five synthetic logs with varying page frequency distributions to simulate different real life web sites and identified dominance zones for each of these models. We find that the model based on higher order page interaction is more robust and gives competitive performance in a variety of situations.
A Comparison of Document Clustering Techniques
(2000-05-23) Steinbach, Michael; Karypis, George; Kumar, Vipin
This paper presents the results of an experimental study of some common document clustering techniques. In particular, we compare the two main approaches to document clustering, agglomerative hierarchical clustering and K-means. (For K-means we used a "standard" K-means algorithm and a variant of K-means, "bisecting" K-means.) Hierarchical clustering is often portrayed as the better quality clustering approach, but is limited because of its quadratic time complexity. In contrast, K-means and its variants have a time complexity which is linear in the number of documents, but are thought to produce inferior clusters. Sometimes K-means and agglomerative hierarchical approaches are combined so as to "get the best of both worlds." However, our results indicate that the bisecting K-means technique is better than the standard K-means approach and as good or better than the hierarchical approaches that we tested for a variety of cluster evaluation metrics. We propose an explanation for these results that is based on an analysis of the specifics of the clustering algorithms and the nature of document data.
A Comparison of Generalized LDA Algorithms for Undersampled Problems
(2003-12-11) Park, Cheonghee; Park, Haesun
Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the between-class scatter and minimizes the within-class scatter. In undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA applicable for undersampled problems, several generalizations of the LDA have been proposed recently. In this paper, we present the theoretical and algorithmic relationships among several generalized LDA algorithms and compare their computational complexities and performances in text classification and face recognition. Towards a practical dimension reduction method for high dimensional data, an efficient algorithm is also proposed.
A Comparitive Study of Linear and Nonlinear Feature Extraction Methods
(2004-11-17) Park, Cheonghee; Park, Haesun; Pardalos, Panos
Linear Discriminant Analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the between-class scatter and minimizes the withinclass scatter. However, in undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply the LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make the LDA applicable, several generalizations of the LDA have been proposed. This paper presents theoretical and algorithmic relationships among several generalized LDA algorithms. Utilizing the relationships among them, computationally efficient approaches to these algorithms are proposed. We also present nonlinear extensions of these LDA algorithms. The original data space is mapped to a feature space by an implicit nonlinear mapping through kernel methods. A generalized eigenvalue problem is formulated in the transformed feature space and generalized LDA algorithms are applied to solve the problem. Performances and computational complexities of these linear and nonlinear discriminant analysis algorithms are compared theoretically and experimentally.
A Compiler Framework for Recovery Code Generation in General Speculative Optimizations
(2003-12-28) Lin, Jin; Hsu, Wei-Chung; Yew, Pen-Chung; Dz-ching Ju, Roy; Ngai, Tin-fook
Data speculation is an effective way to overcome imprecise alias analysis for many compiler optimizations. A general framework that integrates both control and data speculation using alias profiling and/or compiler heuristic rules has shown to improve SPEC2000 performance on Itanium systems [16]. However, speculative optimizations require check instructions and recovery code to ensure correct execution when speculation fails. How to generate check instructions and their associated recovery code efficiently and effectively is an issue yet to be well studied. Also, it is very important to avoid inhibiting later optimizations after the recovery code is generated in the earlier phases. This paper proposes a framework that uses an if-block structure to facilitate check instructions and recovery code generation for general speculative optimizations. It allows speculative and recovery code generated early in the compiler phases to be integrated effectively with the subsequent compiler optimization phases. It also allows multi-level speculation for multi-level pointers and multi-level expression trees to be handled with no additional complexity. The proposed recovery code generation framework has been implemented on the Open Research Compiler (ORC).
A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints
(2010-06-21) Fang, Gang; Haznadar, Majda; Wang, Wen; Steinbach, Michael; Van Ness, Brian; Kumar, Vipin
This paper has not yet been submitted.
A Data Mining Framework for Forest Fire Mapping
(2012-03-29) Karpatne, Anuj; Chen, Xi; Chamber, Yashu; Mithal, Varun; Lau, Michael; Steinhaeuser, Karsten; Boriah, Shyam; Steinbach, Michael; Kumar, Vipin
Forests are an important natural resource that support economic activity and play a significant role in regulating the climate and the carbon cycle, yet forest ecosystems are increasingly threatened by fires caused by a range of natural and anthropogenic factors. Mapping these fires, which can range in size from less than an acre to hundreds of thousands of acres, is an important task for supporting climate and carbon cycle studies as well as informing forest management. There are two primary approaches to fire mapping: field and aerial-based surveys, which are costly and limited in their extent; and remote sensing-based approaches, which are more cost-effective but pose several interesting methodological and algorithmic challenges. In this paper, we introduce a new framework for mapping forest fires based on satellite observations. Specifically, we develop spatio-temporal data mining methods for Moderate Resolution Imaging Spectroradiometer (MODIS) data to generate a history of forest fires. A systematic comparison with alternate approaches across diverse geographic regions demonstrates that our algorithmic paradigm is able to overcome some of the limitations in both data and methods employed by other prior efforts.
A Decomposition-Based Approach to Layered Manufacturing
(2000-10-10) Ilinkin, Ivaylo; Janardan, Ravi; Majhi, Jayanth; Schwerdt, Jörg; Smid, Michiel; Sriram, Ram
This paper introduces a new approach for improving the performance and versatility of Layered Manufacturing (LM), which is an emerging technology that makes it possible to build physical prototypes of 3D parts directly from their CAD models using a relatively small and inexpensive "3D printer" attached to a personal computer. LM provides the designer with an additional level of physical verification that makes it possible to detect and correct design flaws that may have, otherwise, gone unnoticed in the virtual model.Current LM processes work by viewing the CAD model as a single, monolithic unit. By contrast, the approach proposed here decomposes the model into a small number of pieces, by intersecting it with a suitably chosen plane, builds each piece separately using LM, and then glues the pieces together to obtain the physical prototype. This approach allows large models to be built quickly in parallel and also lends itself naturally to applications where the model needs to be built as several pieces, such as in the manufacture of mold halves for injection molding. Furthermore, this approach is very efficient in its use of so-called support structures that are generated by the LM process.This paper presents the first provably correct and efficient geometric algorithms to decompose polyhedral models so that the support requirements (support volume and area of contact) are minimized. Algorithms based on the plane-sweep paradigm are first given for convex polyhedra. These algorithms run in O(n log n) time for n -vertex convex polyhedra and work by generating expressions for the support volume and contact-area as a function of the height of the sweep plane, and optimizing them during the sweep. Experimental results are given for randomly-generated convex polyhedra with up to 200,000 vertices. These algorithms are then generalized to non-convex polyhedra, which are considerably more difficult due to the complex structure of the supports. It is shown that, surprisingly, non-convex polyhedra can be handled by first identifying certain critical facets using a technique called cylindrical decomposition, and then applying the algorithm for convex polyhedra to these critical facets. The resulting algorithms run in O(n2log n) time.
A Distributed Approximation for Minamum Connecting Dominated Sets in Wireless Networks
(2003-05-02) Min, Manki; Lanka, Raghuram; Huang, Xiao; Du, Ding-Zhu
A wireless ad-hoc network is a wireless, multihop network without an infrastructure. And due to the limited resources of hosts in the network, the resource utilization becomes more critical. As a result, construction of virtual backbones in wireless ad-hoc networks gets more attractive. In this paper, we present a distributed approximation scheme for minimum connected dominating sets in unit-disk graphs which can serve as a virtual backbone in the network. Our algorithm has the performance ratio of 7 and the message complexity of $O(n log n)$, which are the best among the schemes in the literature.
A DSL for cross-domain security
(High Integrity Language Technology ACM SIGAdaâ€™s Annual International Conference (HILT 2012), 2012) Hardin, David; Slind, Konrad; Whalen, Michael; Pham, Hung T.
A Fast Algorithm to Compute Heap Memory Bounds of Java Card Applets
(2008) Pham, Hung T.; Truong, Anh-Hoang; Truong, Ninh-Thuan; Chin, Wei-Ngan
In this paper, we present an approach to find upper bounds of heap space for Java Card applets. Our method first transforms an input bytecode stream into a control flow graph (CFG), and then collapses cycles of the CFG to produce a directed acyclic graph (DAG). Based on the DAG, we propose a linear-time algorithm to solve the problem of finding the single-source largest path in it. We also have implemented a prototype tool, tested it on several sample applications, and then compared the bounds found by our tool with the actual heap bounds of the programs. The experiment shows that our tool returns good estimation of heap bounds, runs fast, and has a small memory footprint.
A Fast Dimension Reduction Algorithm with Applications on Face Recognition and Text Classification
(2003-12-19) Park, Cheonghee; Park, Haesun
In undersampled problems where the number of samples is smaller than the dimension of data space, it is difficult to apply Linear Discriminant Analysis (LDA) due to the singularity of scatter matrices caused by high dimensionality. We propose a fast dimension reduction method based on a simple modification of Principal Component Analysis (PCA) and the orthogonal decomposition. The proposed algorithm is an efficient way to perform LDA for undersampled problems. Our experimental results in face recognition and text classification demonstrate the effectiveness of our proposed method.
A flexible and non-intrusive approach for computing complex structural coverage metrics
(ACM, 2015) Whalen, Michael; Person, Suzette; Rungta, Neha; Staats, Matt; Grijincu, Daniella
Software analysis tools and techniques often leverage structural code coverage information to reason about the dynamic behavior of software. Existing techniques instrument the code with the required structural obligations and then monitor the execution of the compiled code to report coverage. Instrumentation based approaches often incur considerable runtime overhead for complex structural coverage metrics such as Modified Condition/Decision (mcdc). Code instrumentation, in general, has to be approached with great care to ensure it does not modify the behavior of the original code. Furthermore, instrumented code cannot be used in conjunction with other analyses that reason about the structure and semantics of the code under test. In this work, we introduce a non-intrusive preprocessing approach for computing structural coverage information. It uses a static partial evaluation of the decisions in the source code and a source-to-bytecode mapping to generate the information necessary to efficiently track structural coverage metrics during execution. Our technique is flexible; the results of the preprocessing can be used by a variety of coverage-driven software analysis tasks, including automated analyses that are not possible for instrumented code. Experimental results in the context of symbolic execution show the efficiency and flexibility of our non-intrusive approach for computing code coverage information.