Browsing by Subject "Big data"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item An interactive design framework based on data-intensive simulations: implementation and application to device-tissue interaction design problems(2015-02) Lin, Chi-LunThis dissertation investigates a new medical device design approach based on extensive simulations. A simulation-based design framework is developed to create a design workflow that integrates engineering software tools with an interactive user interface, called Design by Dragging (DBD) \cite{Coffey:2013ko}, to generate a large-scale design space and enable creative design exploration. Several design problems illustrate this design workflow are investigated via featured forward and inverse design manipulation strategies provided by DBD. A device-tissue interaction problem as part of a vacuum-assisted breast biopsy (VAB) cutting process is particularly highlighted. A tissue-cutting model is created for this problem to simulate the device-tissue contact, excessive tissue deformation and progressive tissue damage during the cutting process. This model is then applied to the design framework to generate extensive simulations that samples a large design space for interactive design exploration. This example represents an important milestone toward simulation-based engineering for medical device prototyping. The simulation-based design framework is implemented to integrate a computer aided design (CAD) software tool, a finite element analysis (FEA) software tool (SolidWorks and Abaqus are selected in this dissertation) and a high performance computing (HPC) cluster into a semi-automatic design workflow via customized communication interfaces. The design framework automates the process from generating and simulating design configurations to outputting the simulation results. The HPC cluster enables multiple simulation job executions and parallel computation to reduce the computation cost. The design framework is first tested using a simple bending needle example, which generates 460 simulations to sample a design space in DBD. The functionality of the creative inverse and forward design manipulation strategies are demonstrated. A tissue cutting model of a VAB device is developed as an advanced benchmark example for the design framework. The model simulates the breast lesion tissue being positioned in a needle cannula chamber and being cut by a hollow cutting tube with simultaneous rotation and translation. Different cutting conditions including cutting speeds and tissue properties are investigated. This VAB device design problem is then applied to the design framework. Critical design variables and performance attributes across three main components of the VAB device (the needle system, motor system and device handpiece) are identified. 900 design configurations are generated and simulated to sparsely populate a large-design space of $10^6$ possible solutions. The design space is explored via the creative design manipulation strategies and several uses cases are established. The bending needle example demonstrates the first success of the proposed simulation-based design framework. The 460 simulations are completed with minimal manual interventions. The functionality of DBD is also demonstrated. The inverse and forward design strategies allow interacting with the design space via dragging on a radar chart widget or directly on the visualization of the simulation. Through the interactions the user is guided to the desired solutions.The VAB tissue-cutting example provides a realistic medical device application of the design framework. The 900 simulations are completed in parallel in the HPC cluster so that the computation time is significantly reduced. The simulation output data is converted to a high-efficiency data format called NetCDF so that the post-compuation for sampling this large design space is made possible. Several use cases are demonstrated. By interacting with the radar chart widget, the user gradually gains the understanding and new insights about the effect of design variable modifications. Next, the direct manipulation strategies via visualization of the simulations are used to solve three issues, including a 'dry tap', moving a leading edge of the tissue sample and narrowing a stress concentration area. These use cases successfully demonstrated the capability and the usability of the design framework.There are two major contributions of this dissertation. The first is the investigation of the new design approach that enables creative design exploration based on extensive simulation data. This success moves a step toward the simulation-based medical device engineering with big data. The second is the FEA simulation model for the VAB tissue cutting process. This model utilizes realistic breast tissue properties to predict cutting forces during the VAB sampling process, which has not been found in the literature. The studies conducted using this model extend the current understanding of the VAB tissue cutting process under different cutting conditions. All of these achievements illustrate the potential for a future medical device virtual prototyping environment.Item Leveraging Sparsity and Low Rank for Large-Scale Networks and Data Science(2015-05) Mardani, MortezaWe live in an era of ``data deluge," with pervasive sensors collecting massive amounts of information on every bit of our lives, churning out enormous streams of raw data in a wide variety of formats. While big data may bring ``big blessings," there are formidable challenges in dealing with large-scale datasets. The sheer volume of data makes it often impossible to run analytics using central processors and storage units. Network data are also often geographically spread, and collecting the data might be infeasible due to communication costs or privacy concerns. Disparate origin of data also makes the datasets often incomplete, and thus a sizable portion of entries are missing. Moreover, large-scale data are prone to contain corrupted measurements, communication errors, and even su ffer from anomalies due to cyberattacks. Moreover, as many sources continuously generate data in real time, analytics must often be performed online as well as without an opportunity to revisit past data. Last but not least, due to variety, data is typically indexed by multiple dimensions. Towards our vision to facilitate learning, this thesis contributes to cope with these challenges via leveraging the low intrinsic-dimensionality of data by means of sparsity and low rank. To build a versatile model capturing various data irregularities, the present thesis focuses first on a low-rank plus compressed-sparse matrix model, which proves successful in unveiling trffia c anomalies in backbone networks. Leveraging the nuclear and \ell_1-norm, exact reconstruction guarantees are established for a convex estimator of the unknowns. Inspired by the crucial task of network tra ffic monitoring, the scope of this model and recovery task is broaden to a tomographic task of jointly mapping out nominal and anomalous tra ffic from undersampled linear measurements. Despite the success of nuclear-norm minimization in capturing the data low dimensionality, it scales very poorly with the data size mainly due to its tangled nature. This indeed hinders decentralized and streaming analytics. To mitigate this computational challenge, this thesis puts forth a neat framework which permeates benefits from a bilinear characterization of nuclear-norm to bring separability at the expense of nonconvexity. Notwithstanding, it is proven that under certain conditions stationary points of nonconvex program coincide with the optimum of the convex counterpart. Using this idea along with theory of alternating minimization we develop lightweight algorithms with low communication-overhead for in-network processing; and provably convergent online ones suitable for streaming analytics. All in all, the major innovative claim is that even with the budget of distributed computation and sequential acquisition one can hope to achieve accurate reconstruction guarantees o ffered by the batch nuclear-norm minimization. Finally, inspired by the k-space data interpolation task appearing in dynamic magnetic resonance imaging, a novel tensor subspace learning framework is introduced to handle streaming multidimensional data. It capitalizes on the PARAFAC decomposition and e effects low tensor rank by means of the Tykhonov regularization, that enjoys separability and offers real-time MRI reconstruction tailoring e.g., image-guided radiation therapy applications.Item Mapping oak wilt disease from space using land surface phenology(Remote Sensing of Environment, 2023-12-01) Guzmán, Jose A; Pinto-Ledezma, Jesús N; Frantz, David; Townsend, Philip A; Juzwik, Jennifer; Cavender-Bares, JeannineProtecting the future of forests relies on our ability to observe changes in forest health. Thus, developing tools for sensing diseases in a timely fashion is critical for managing threats at broad scales. Oak wilt —a disease caused by a pathogenic fungus (Bretziella fagacearum)— is threatening oaks, killing thousands yearly while negatively impacting the ecosystem services they provide. Here we propose a novel workflow for mapping oak wilt by targeting temporal disease progression through symptoms using land surface phenology (LSP) from spaceborne observations. By doing so, we hypothesize that phenological changes in pigments and photosynthetic activity of trees affected by oak wilt can be tracked using LSP metrics derived from the Chlorophyll/Carotenoid Index (CCI). We used dense time-series observations from Sentinel-2 to create Analysis Ready Data across Minnesota and Wisconsin and to derive three LSP metrics: the value of CCI at the start and end of the growing season, and the coefficient of variation of the CCI during the growing season. We integrate high-resolution airborne imagery in multiple locations to select pixels (n = 3872) from the most common oak tree health conditions: healthy, symptomatic for oak wilt, and dead. These pixels were used to train an iterative Partial Least Square Discriminant (PLSD) model and derive the probability of an oak tree (i.e., pixel) in one of these conditions and the associated uncertainty. We assessed these models spatially and temporally on testing datasets revealing that it is feasible to discriminate among the three health conditions with overall accuracy between 80 and 82%. Within conditions, our models suggest that spatial variations among three CCI-derived LSP metrics can identify healthy (Area Under the Curve (AUC) = 0.98), symptomatic (AUC = 0.89), and dead (AUC = 0.94) oak trees with low false positive rates. The model performance was robust across different years as well. The predictive maps were used to guide local stakeholders to locate disease hotspots for ground verification and subsequent decision-making for treatment. Our results highlight the capabilities of LSP metrics from dense spaceborne observations to map diseases and to monitor large-scale change in biodiversity.Item Modern Classification with Big Data(2018-07) Wang, BoxiangRapid advances in information technologies have ushered in the era of "big data" and revolutionized the scientific research across many disciplines, including economics, genomics, neuroscience, and modern commerce. Big data creates golden opportunities but has also arisen unprecedented challenges due to the massive size and complex structure of the data. Among many tasks in statistics and machine learning, classification has diverse applications, ranging from improving daily life to reaching the new frontiers of science and engineering. This thesis will discuss the envisions of broader approaches to modern classification methodologies, as well as computational considerations to cope with the big data challenges. Chapter 2 of the thesis presents a modern classification method named data-driven generalized distance weighted discrimination. A fast algorithm with an emphasis on computational efficiency for big data will be introduced. Our method is formulated in a reproducing kernel Hilbert space, and learning theory of the Bayes risk consistency will be developed. We will use extensive benchmark data applications to demonstrate that the prediction accuracy of our method is highly competitive with state-of-the-art classification methods including support vector machine, random forest, gradient boosting, and deep neural network. Chapter 3 introduces sparse penalized DWD for high-dimensional classification, which is commonly used in the era of big data. We develop a very efficient algorithm to compute the solution path of the sparse DWD at a given fine grid of regularization parameters. Chapter 4 proposes multicategory kernel distance weighted discrimination for multi-class classification. The proposal is defined as a margin-vector optimization problem in a reproducing kernel Hilbert space. This formulation is shown to enjoy Fisher consistency. We develop an accelerated projected gradient descent algorithm to fit multicategory kernel DWD. Chapter 5 develops a magic formula for doing CV in the context of large margin classification. We design a novel and successful algorithm to fit and tune the support vector machine.Item Sparsity control for robustness and social data analysis.(2012-05) Mateos Buckstein, GonzaloThe information explosion propelled by the advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. The ability to mine valuable information from unprecedented volumes of data will facilitate preventing or limiting the spread of epidemics and diseases, identifying trends in global financial markets, protecting critical infrastructure including the smart grid, and understanding the social and behavioral dynamics of emergent social-computational systems. Along with data that adhere to postulated models, present in large volumes of data are also those that do not – the so-termed outliers. This thesis contributes in several issues that pertain to resilience against outliers, a fundamental aspect of statistical inference tasks such as estimation, model selection, prediction, classification, tracking, and dimensionality reduction, to name a few. The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present thesis introduces a neat link between sparsity and robustness against outliers, even when the signals involved are not sparse. It is argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. Even though focus is placed first on robustifying linear regression, the universality of the developed framework is highlighted through diverse generalizations that pertain to: i) the information used for selecting the sparsity-controlling parameters; ii) the nominal data model; and iii) the criterion adopted to fit the chosen model. Explored application domains include preference measurement for consumer utility function estimation in marketing, and load curve cleansing – a critical task in power systems engineering and management. Finally, robust principal component analysis (PCA) algorithms are developed to extract the most informative low-dimensional structure, from (grossly corrupted) high-dimensional data. Beyond its ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Item Statistical Methods for Large Complex Datasets(2016-05) Datta, AbhirupModern technological advancements have enabled massive-scale collection, processing and storage of information triggering the onset of the `big data' era where in every two days now we create as much data as we did in the entire twentieth century. This thesis aims at developing novel statistical methods that can efficiently analyze a variety of large complex datasets. Underlying the umbrella theme of big data modeling, we present statistical methods for two different classes of large complex datasets. The first half of the thesis focuses on the 'large n' problem for large spatial or spatio-temporal datasets where observations exhibit strong dependencies across space and time. In the second half of this thesis we present methods for high-dimensional regression in the `large p small n' setting for datasets that contain measurement errors or change points.