Quantitative and Qualitative Analysis in the work with West African Languages

The work we report on is part of a larger research project which searches to combine generally accessible resources of African languages into common repositories and platforms on which property extraction from these resources can lead to new views and new insights into the phenomena addressed. Among these resources are the TypeCraft Interlinear Glossed Text Repository (TC) (Beermann & Mikailov 2014), the African languages corpora and search environment of the Leipzig Corpora Collection (LCC) (Goldhahn 2012), and resources from the multilingual verb valence project (Hellan et al. 2014). Dealing with these resources in a common digital infrastructure facilitates various types of linguistic processing and corpus methodologies for lesser-resourced languages. In the present paper we focus on semi-automatic acquisition of linguistic resources at part-of-speech, morphology and valency levels. In this way we support the aim of our project of increasing the access to data from African languages by providing accessibility to data at different levels of analysis that can inform linguistic research, and thus give a new impetus to linguists and language experts to employ digital services for data analysis.

Keywords: corpus methodologies, quantitative analysis, qualitative analysis, African languages, IGT, valency


DOWNLOAD: Paper Beermann and Hellan