Martin, Anna2022-08-292022-08-292022-05https://hdl.handle.net/11299/241255University of Minnesota M.S. thesis. May 2022. Major: Computer Science. Advisor: Ted Pedersen. 1 computer file (PDF); x, 113 pages.The rapid growth rate of scientific literature makes it increasingly difficult for researchers to keep up with developments in their field. This is a problem that can be addressed by structuring academic papers according to information units that go deeper than keywords. The need to efficiently structure scholarly documents so that they are machine operable necessitates the creation of machine readers to extract and classify bits of fine grained scientific information. This process requires the development of gold standard corpora of annotated scholarly work. For this thesis we developed a gold-standard corpus of task description phrase annotations from Shared Task Overview papers and trained a text classifier on the resulting dataset. The annotation project consisted of: developing a set of annotation guidelines; reading and annotating the task descriptions of 254 Shared Task Overview papers published in the ACL Anthology; validating our guidelines by measuring the Inter-Annotator Agreement; and digitizing the resulting corpus such that it could be used as a resource in machine learning projects. The resulting dataset comprises 254 full text papers containing 41,752 sentences and 259 task descriptions. In our second and final validation we achieved a strict score of 0.44 and a relaxed score of 0.95, measured using Cohen's kappa coefficent. We then used this resource to facilitate the training and development of a classifier to perform automatic identification of shared task descriptions. For preprocessing, we improved the balance between negative and positive samples by eliminating every paper section that does not contain a task description. During our machine learning experiments we trained and validated 18 different sentence classification models using a variety of text encodings and hyperparameter settings. The best performing model was SciBERT, which achieved an F1 score of 0.75 when applied to the reduced test set.enscholarly document processingscientific information extractionAnnotating and Automatically Extracting Task Descriptions from Shared Task Overview Papers in Natural Language Processing DomainsThesis or Dissertation