Is Neural Machine Translation viable for low-resource languages? an experimental study of the Irish language
Authors
Published Date
Publisher
Abstract
Transformer-based Neural Machine Translation (NMT) models are Large Language Models (LLMs) designed and developed for translating between two or more given languages. These are typically most successful in the context of high-resource languages, languages with plentiful amounts of available online text corpora, such as English, Spanish, or French. In contrast, languages with limited corpora are known as low-resource languages and tend to be overlooked or underrepresented, like Basque, Pashto, or Ojibwe. One of these low-resource languages is Irish (Gaeilge), which has approximately 1.9 million total speakers as of 2022, and an extremely limited pool of publicly available datasets and machine translation systems. In response to this shortage, we created three bilingual English-Irish datasets and three transformer models for translating from English to Irish. Our models were then evaluated on four automatic evaluation metrics, BLEU, TER, CHRF, and METEOR, and demonstrated promising results across all our datasets.
Description
University of Minnesota M.S. thesis. July 2025. Major: Computer Science. Advisor: Ted Pedersen. 1 computer file (PDF); viii, 66 pages.
Related to
item.page.replaces
License
Series/Report Number
Funding Information
item.page.isbn
DOI identifier
Previously Published Citation
Other identifiers
Suggested Citation
Quigley, Jack. (2025). Is Neural Machine Translation viable for low-resource languages? an experimental study of the Irish language. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/277324.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.
