Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks

Published Date

2018-01

Publisher

Type

Thesis or Dissertation

Abstract

The large flood of data flowing from observatories presents significant challenges to astronomy and cosmology – challenges that will only be magnified by projects currently under development. Growth in both volume and velocity of astrophysics data is accelerating: whereas the Sloan Digital Sky Survey (SDSS) has produced 60 terabytes of data in the last decade, the upcoming Large Synoptic Survey Telescope (LSST) plans to register 30 terabytes per night starting in the year 2020. Additionally, the Euclid Mission will acquire imaging for ∼ 5 × 10^7 resolvable galaxies. The field of galaxy evolution faces a particularly challenging future as complete understanding often cannot be reached without analysis of detailed morphological galaxy features. Historically, morphological analysis has relied on visual classification by astronomers, accessing the human brains capacity for advanced pattern recognition. However, this accurate but inefficient method falters when confronted with many thousands (or millions) of images. In the SDSS era, efforts to automate morphological classifications of galaxies (e.g., Conselice et al., 2000; Lotz et al., 2004) are reasonably successful and can distinguish between elliptical and disk-dominated galaxies with accuracies of ∼80%. While this is statistically very useful, a key problem with these methods is that they often cannot say which 80% of their samples are accurate. Furthermore, when confronted with the more complex task of identifying key substructure within galaxies, automated classification algorithms begin to fail. The Galaxy Zoo project uses a highly innovative approach to solving the scalability problem of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org asks people to classify images by eye. Within the first year hundreds of thousands of members of the general public had classified each of the ∼1 million SDSS galaxies an average of 40 times. Galaxy Zoo thus solved both the visual classification problem of time efficiency and improved accuracy by producing a distribution of independent classifications for each galaxy. While crowd-sourced galaxy classifications have proven their worth, challenges remain before establishing this method as a critical and standard component of the data processing pipelines for the next generation of surveys. In particular, though innovative, crowd-sourcing techniques do not have the capacity to handle the data volume and rates expected in the next generation of surveys. These algorithms will be delegated to handle the majority of the classification tasks, freeing citizen scientists to contribute their efforts on subtler and more complex assignments. This thesis presents a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme we increase the classification rate nearly 5-fold classifying 226,124 galaxies in 92 days of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7% accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides a factor of 11.4 increase in the classification rate, classifying 210,803 galaxies in just 32 days of GZ2 project time with 93.1% accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.

Description

University of Minnesota Ph.D. dissertation.January 2018. Major: Astrophysics. Advisor: Claudia Scarlata. 1 computer file (PDF); xiii, 158 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Beck, Melanie. (2018). Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/194632.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.