Learning with Weak Supervision for Land Cover Mapping Problems

Nayak, Guruprasad2020-05-042020-05-042020-01https://hdl.handle.net/11299/213091University of Minnesota Ph.D. dissertation. January 2020. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); ix, 92 pages.Land cover mapping is the task of generating maps of land use globally across time. The recent decades have seen an increasing availability of public satellite data sets with observations of the Earth at regular intervals of space and time. This coupled with the advances in machine learning and high performance computing provide an opportunity to automate the land cover mapping problem at scale. However, the availability of labeled data to train predictive models in this application is very limited, especially in the developing regions of the world, where accurate land cover maps are necessary for effective management of natural resources to sustain the rapid population growth in these regions. The need for labeled samples is further increased by: (1) Heterogeneity of land cover classes across space and time; (2) Increasing complexity of state-of-the-art predictive models and (3) Lack of sufficient samples at the required spatial and temporal resolutions. Since paucity of labeled data is a major problem in this domain, traditional machine learning algorithms that only rely on exact labeled data (strong supervision) have limited performance. This thesis investigates the use of weak supervision to mitigate the problem of not having sufficient samples with exact labels. In a weakly-supervised learning scenario, you have very few training samples that have exact labels corresponding to the target variable. However, you have plenty of weakly-labeled instances i.e you have an imperfect version of the target variable for these instances. The idea is that, by modeling the imperfection in the weak labels, we can mitigate the lack of (strongly-labeled) training data. We study three commonly-occurring sources of weak supervision for the land cover mapping problem: (1) Ordinal labels as weak supervision for regression (WORD); (2) Group-level labels as weak supervision for binary classification (WeaSL); and (3) Group-level labels with group-level features as weak supervision for binary classification (MultiRes). In each of these cases, we show that modeling the inexact nature of the weak supervision enables us to mitigate the lack of strong supervision. By extensive experiments on multiple data sets, we show that use of weak supervision (1) increases the generalizability of models trained with only strong supervision and (2) enables the use of more complex predictive models. In addition, since weak supervision is available in plenty, they provide a better representation of the class imbalance, when present in the population. WORD and WeaSL demonstrably optimize the performance of the model for rarity using weak supervision. Finally, although the data sets used in this thesis mainly come from the land cover problems of burned area mapping and urban mapping, the methods developed in this thesis are applicable to other domains as well, where similar forms of weak labels are available as demonstrated by experiments on data sets from other domains like natural language processing.enClass imbalanceGroup LabelsLand cover mappingOrdinal LabelsSemi-supervised LearningWeak SupervisionLearning with Weak Supervision for Land Cover Mapping ProblemsThesis or Dissertation