Typecraft v2.5
Jump to: navigation, search

Classroom:LING2208 - Annotating Norwegian Bokmål/Agreement statistics

Revision as of 16:35, 25 February 2014 by Nicklas Nilsen (Talk | contribs)

Tagging of gender in Norwegian Bokmål

Regarding the gloss tags of adjectives, there is variation in the conventions for Norwegian Bokmål corpora of TypeCraft. Adjectives are at times glossed with grammatical gender tags, at other times not.

A possible theory for why some adjectives are tagged with gender, might be the neuter form. In Norwegian, the neuter form of adjectives more often than not is distinctly different from the masculine or feminine form. In fact, the masculine and feminine forms are indistinguishable from each other, and also indistinguishable from the base form. One could therefore expect NEUT to be an overrepresented tag among the adjectives tagged for gender.

Using TypeCraft's Phrase Search (for Norwegian Bokmål), performing three searches: [("POS:ADJ", "gloss:FEM"), ("POS:ADJ", "gloss:MASC"), ("POS:ADJ", "gloss:NEUT")], should result in three values. These are the number of adjectives that are tagged with each gender (summing them gives the total amount of gender-tagged adjectives).

In comparison, performing three searches just for the POS tags: [("gloss:FEM"), ("gloss:MASC"), ("gloss:NEUT")], should result in three new values. These are the total number of words in TypeCraft tagged with a gender (for Bokmål).

Gender Adjectives Total for all tags in TypeCraft
FEM 0 (0%) 33 (6.33%)
MASC 13 (21%) 302 (58%)
NEUT 49 (79%) 186 (35.7%)
Total: 62 (100%) 521 (100%)

The results of such a search, is evidence for the aforementioned hypothesis; NEUT being overrepresented as a gloss tag for adjectives, as opposed to NEUT as a tag for any POS.


The following table describes the distribution of marked gender as glossed on adjectives, and the total distribution of tags for Norwegian Bokmål in TypeCraft. This is compared to the distribution of genders among nouns in the NoWaC corpus. The percentages in the first columns represent the ratio of each tag to the total for each count, (i.e: 56% of all nouns are tagged in NoWaC as masculine). The final column contains the compound ratio of the ratio of each gender in entries tagged with ADJ in TypeCraft and the ratio of each gender in entries tagged as nouns in NoWaC. This gives us an indication of whether some genders are more frequently glossed for adjectives than they naturally occur.

Gender Adjectives Total for all tags in TypeCraft Total for nouns in NoWaC Ratio for ADJ to NoWaC
FEM 0 (0%) 33 (6.33%) 20358360 (16.47%) 0%
MASC 13 (21%) 302 (58%) 69209955 (56%) 37.5%
NEUT 49 (79%) 186 (35.7%) 34026414 (27.53%) 286.96%
Total: 62 (100%) 521 (100%) 123594729 (100%) N/A

From this data we can see that infinitival gender is overrepresented for adjectives. This is due to feminine and masculine genders (which appear to be equally underrepresented) not being indicated morphologically in adjectives, but rather indicated by their un-inflected base form, neuter adjectives are inflected with a morpheme. This reflects a tagging convention that is morphologically oriented.