Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "Norwegian Valency Corpus"

(Version 1.0 - trial version)
 
Line 26: Line 26:
 
is the valence type. You can glue this into the 'Phrase description' field in the Search interface and get all sentences realizing this frame. By, in addition, specifying "yppe" in Word level - Exact form' you get all sentences with "yppe" used with this frame.
 
is the valence type. You can glue this into the 'Phrase description' field in the Search interface and get all sentences realizing this frame. By, in addition, specifying "yppe" in Word level - Exact form' you get all sentences with "yppe" used with this frame.
  
Soon to be posted is a link to a valence lexicon '''NorVal''', a development related to the above resource.
+
See
 
+
 
[[NorVal resources]]  
 
[[NorVal resources]]  
 +
for the general valence resource reflected in the annotations.
  
 
The present version is a trial of a methodology described in [[to appear]], which potentially allows for a rapid increase in corpus size. The present version has clear errors, to be improved for a next stage.
 
The present version is a trial of a methodology described in [[to appear]], which potentially allows for a rapid increase in corpus size. The present version has clear errors, to be improved for a next stage.

Latest revision as of 14:55, 23 September 2021

Version 1.0 - trial version

--Typecraft (talk) 10:03, 18 July 2017 (CEST)

The corpus consists of 22000 sentences imported from the Leipzig Corpus Collection, all with the standard TypeCraft IGT annotation and with valency information for each verb occurrence, given in the form exemplified for ditransitive:

SAS: NP+NP+NP
FCT: ditransitive
SIT: ternaryRel
ConstructionLabel: v-ditr 

Here 'SAS' stands for 'syntactic argument structure', 'FCT' stands for 'functional characterization', 'SIT' for situation structure, and 'ConstructionLabel' for a code described at Verbconstructions cross-linguistically - Introduction. The valency information is stated relative to the ACTIVE form of the verb, even if the example provided is in passive. When doing search you can use either of these types of labels. The array of options within each type is explained and exemplified as follows:

SAS at Valency label 'SAS'
FCT at Valency label 'FCT'
SIT at [[]] 
ConstructionLabel at Valence Profile Norwegian (for illustrations using English, see
Valence Profile English).

Joint illustrations of them all are given in Valency code illustrations.

You can search relative to valency type in general, or specifically for a given verb, where the verb can be stated by citation form or by its actually occurring form. The search interface is the standard one for TypeCraft:

TypeCraft Tools (in upper left corner) -> TypeCraft Search -> Phrase search.

On this page choose 'Norwegian Bokmål' from the Language menu; at 'Phrase level', write (or glue) the valency label into the slot 'Phrase description'. If you want to search also relative to verb, enter the exact form of the verb under 'Word level - Exact form'. (The slot for its citation form is 'Morpheme level - Exact base form', however this search option is temporarily disabled. The same holds for any other search for morphological properties when done in conjunction with 'Phrase description'.)

A verb lexicon with valence types given in the ConstructionLabel format is given in Valence lexicon. Each entry, specific to a specific frame, is given on the form exemplified below

yppe_tr-refl_vlxm := v-tr-obRefl 

where the part

v-tr-obRefl 

is the valence type. You can glue this into the 'Phrase description' field in the Search interface and get all sentences realizing this frame. By, in addition, specifying "yppe" in Word level - Exact form' you get all sentences with "yppe" used with this frame.

See NorVal resources for the general valence resource reflected in the annotations.

The present version is a trial of a methodology described in to appear, which potentially allows for a rapid increase in corpus size. The present version has clear errors, to be improved for a next stage.