Investigating AAVE in Question Answering Systems
Georgia Institute of Technology Undergradaute Thesis (GT Thesis), 2023
Abstract
The advancement of technology and social inclusion have encouraged the growth of written texts in dialects. Unfortunately, due to the lack of text corpora, most of the state-of-the-art NLP models are trained by Standard American English (SAE) only. It is important to build NLP technology that is both effective and inclusive. Hence, we investigate the performance of state- of-the-art QA systems on AAVE texts. The performance is examined by converting SQuAD and CoQA to AAVE. Our experiments show that the performance of QA systems degrades significantly when tested on AAVE data in a zero-show setting. While the performance can be partially recovered by incorporating AAVE data in the training set, it still leaves much space for improvement.