Quantitative and Qualitative Analysis in the work with West African Languages

The paper reflects work that we had planned to present at the WALC in Winneba 2017 conference in Winneba, Ghana, and that we have further developed. In its present form it reports on a larger research project which searches to combine generally accessible resources of African languages into common repositories and platforms. Among these resources are the TypeCraft Interlinear Glossed Text Repository (TC) (Beermann & Mikailov 2014), the African languages corpora and search environment of the Leipzig Corpora Collection (LCC) (Goldhahn 2012), and resources from the multilingual verb valence project (Hellan et al. 2014). Dealing with these resources in a common digital infrastructure facilitates various types of linguistic processing and corpus methodologies for lesser-resourced languages. In the present paper we focus on semi-automatic acquisition of linguistic resources at part-of-speech, morphology and valency levels. We support the increased access to data from African languages by providing accessibility to data at different levels of analysis that can inform linguistic research, and thus give a new impetus to linguists and language experts to employ digital services for data analysis.

Keywords: corpus methodologies, qualitative analysis, quantitative analysis, African languages, IGT, valency

