Models for Limited Labeled Time Series Data with Applications in Sleep Science

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Models for Limited Labeled Time Series Data with Applications in Sleep Science

Published Date

2023-04

Publisher

Type

Thesis or Dissertation

Abstract

Time series are encountered universally in any natural or man-made phenomenon. Time-series analysis has applications in critical domains like healthcare, meteorology, and finance. Recently, there has been a big shift in the nature of collected time-series data, with the popularity of cheaper consumer-grade sensors, e.g., smartwatches. This has provided us with a plethora of lower-quality but high-volume data. Modeling time-varying data is challenging owing to its high dimensionality and complex patterns. These challenges are compounded by issues like missing data which have detrimental effects on downstream tasks like classification. Feature engineering has been an important part of time series analysis, with the use of features like seasonality or frequency transforms. Time-series data's complexity makes feature engineering quite challenging, and hence, deep learning is quite promising. Recently, there has been a lot of work on the time-series using deep learning architectures, which requires access to labeled examples. Labeling is an expensive operation,  especially in areas requiring specialized knowledge like healthcare. In this thesis, we focus on utilizing the limited labeled data efficiently. We propose solutions that leverage: 1) unlabeled data; 2) data with missing time-series observations; and 3) effective use of scarce labels. We primarily focus on showcasing these techniques for applications in sleep science, with data from consumer-grade devices like smart watches becoming available. First, we present a method for unsupervised representation learning to create representations for human activity and sleep data. We exploit the context and content, and reduce subject-specific noise using adversarial training. These representations can be exploited to boost the performance of supervised learning models in low-labeled data settings, unlike the traditional time-series models. Empirical evaluation demonstrates that our proposed method performs better than many strong baseline methods, and adversarial learning helps improve the generalizability of our representations. Second, we use conditional random fields (CRFs) with deep neural networks to capture longer-term dependencies in the dynamics of output labels for time series segmentation tasks. This allows us to capture longer-term context while performing the segmentation labeling, allowing for more efficient usage of limited labels. Our method shows significant improvement over the baseline methods. We apply the proposed method for the detection of sleep stages from  Continuous Positive Air Pressure (CPAP) signals, an at-home therapy device for sleep apnea. Ours is the first work to detect a patient's sleep stages based on the CPAP collected data with reasonable accuracy.   Third, we present a novel semi-supervised method for time series data imputation. Observing missing data in time series is common because of issues like data drops or sensor malfunctioning. Imputation methods are used to fill in these values, with the quality of imputation having a significant impact on downstream tasks like classification. Our proposed semi-supervised approach uses unlabeled data as well as downstream task's labeled data. Our results indicate that the proposed method outperforms the existing supervised and unsupervised time series imputation methods measured on the imputation quality as well as on the downstream tasks ingesting imputed time series. Last, we adapt MixUp, a simple data augmentation technique for time series data. We show that a simple modification in the training process can improve the performance of time series classification methods. We perform data augmentation in both raw time series as well as latent space from time series classification models. The improvement in performance is observed consistently in low labeled data regimes as well as higher data regimes.

Description

University of Minnesota Ph.D. dissertation. April 2023. Major: Computer Science. Advisor: Jaideep Srivastava. 1 computer file (PDF); xii, 135 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Aggarwal, Karan. (2023). Models for Limited Labeled Time Series Data with Applications in Sleep Science. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/257035.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.