Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "TypeCraft Akan Data Collection Release 1.0"

 
Line 28: Line 28:
  
 
[[File:AkanCorpus POS.png|thumb|800px|left|Chart 1 The most frequent POS tag in the TypeCraft Akan Corpus]]
 
[[File:AkanCorpus POS.png|thumb|800px|left|Chart 1 The most frequent POS tag in the TypeCraft Akan Corpus]]
 +
 +
 +
 +
 +
 +
 +
  
 
[[File:AkanCorpus glosses.png|thumb|800px|left|Chart 2 The most frequent POS tag in the TypeCraft Akan Corpus]]
 
[[File:AkanCorpus glosses.png|thumb|800px|left|Chart 2 The most frequent POS tag in the TypeCraft Akan Corpus]]
Line 53: Line 60:
  
  
=====5. Authors, Citation and Contact Information=====
 
The TC Akan corpus was created by Dorothee Beermann.  Joana Awua Ahadofo  assisted with the manual annotations.  Special thanks for their support goes to Associate Professor James Essegbey, University of Florida, Gainsville and The Ghanaian Student Association at the Norwegian University of Science and Technology.
 
The corpus should be cited as follows:
 
  
Dorothee Beermann (2018). TypeCraft Project – The TypeCraft Akan corpus, Release 1.0. TypeCraft – The Interlinear Text Repository.
 
  
  
Line 66: Line 69:
  
  
===Focus and narrative  marking in Akan. An Akan subcorpus featuring the use of the word na. ===
 
planned release: fall 2019
 
  
  
Please address all questions, comments and suggestions to
+
 
Dorothee Beermann (dorothee.beermann_AT-NTNU-NO)
+
 
--[[User:Typecraft|Typecraft]] ([[User talk:Typecraft|talk]]) 18:07, 24 March 2018 (CET)
+
 
 +
 
 +
=====5. Authors, Citation and Contact Information=====
 +
The TC Akan corpus was created by Dorothee Beermann.  Joana Awua Ahadofo  assisted with the manual annotations.  Special thanks for their support goes to Associate Professor James Essegbey, University of Florida, Gainsville and The Ghanaian Student Association at the Norwegian University of Science and Technology.
 +
The corpus should be cited as follows:
 +
 
 +
Dorothee Beermann (2018). TypeCraft Project – The TypeCraft Akan corpus, Release 1.0. TypeCraft – The Interlinear Text Repository.
 +
 
 +
 
 +
 
 +
--[[User:Dorothee Beermann|Dorothee ]] ([[User talk:Dorothee Beermann|talk]]) 13:48, 14 February 2020 (UTC)

Latest revision as of 13:48, 14 February 2020

Release 1.0

1. License and Legal Issues

This corpus is distributed solely for non-commercial, non-profit educational and research use. Release 1.0 consists of a derivative compilation of multiple annotated texts created by linguistic graduate students as part of their class work. The original texts were created between 2007 – 2013 at the Department of Linguistics, NTNU, Trondheim, Norway.


2. Download Release 1.0

ZipFile.jpg Media:Release 1.0.zip.

The zip file contains the Akan data as XML and the release notes. To extract the information, save the file to your computer, and use "Extract To". Then enter the a destination path and press "OK".

The TypeCraft Importer which you find on the navigation bar to the left of this browser window allows you to import the corpus into the TypeCraft Editor for further annotation, or for export to other formats.


3. Description of the TypeCraft Akan Corpus

The Release 1.0 of the TC Akan Corpus consists of 41 short texts, mostly linguistic sentence collections, corresponding to 669 sentences. Two of the released texts are transcribed recordings of students narrating a video. The students doing the original work were native-speakers of Akan. The material was curated, starting in 2016 over the period of 1 ½ years. For the curation expert linguists worked together with student annotators, native and non-native speakers, to achieve a better consistency of the original data. We will say more about the type and depth of annotation below. On a 4 point scale from green (high quality), yellow, orange and red (should not be used for research), we would like to characterise the Release 1.0 as a yellow corpus by which we mean that it can be used for research with some care.

4. Data structure

All texts carry morpheme-based annotations as well as POS tags. The original texts have been translated to English.The whole corpus has been privacy masked and meta data is provided. When loaded to the TypeCraft Editor the the data looks as shown in the example below:



For the annotation of the TC Akan corpora we used an annotation set consisting of 60 POS tags and 123 gloss tags. Chart 1 shows the most frequently assigned POS tags and Chart 2 shows the corresponding for Gloss tags. A link to the complete TypeCraft POS and Gloss annotation sets you find on your navigation bar (under TypeCraft help at the left of your browser window ).

Chart 1 The most frequent POS tag in the TypeCraft Akan Corpus





Chart 2 The most frequent POS tag in the TypeCraft Akan Corpus




















5. Authors, Citation and Contact Information

The TC Akan corpus was created by Dorothee Beermann. Joana Awua Ahadofo assisted with the manual annotations. Special thanks for their support goes to Associate Professor James Essegbey, University of Florida, Gainsville and The Ghanaian Student Association at the Norwegian University of Science and Technology. The corpus should be cited as follows:

Dorothee Beermann (2018). TypeCraft Project – The TypeCraft Akan corpus, Release 1.0. TypeCraft – The Interlinear Text Repository.


--Dorothee (talk) 13:48, 14 February 2020 (UTC)