Jump to: navigation, search

Difference between revisions of "Multilingual Verb Valence Lexicon"

 
(101 intermediate revisions by 3 users not shown)
Line 1: Line 1:
THIS PAGE IS BEING CONSTRUCTED--[[User:Lars Hellan|Lars Hellan]] 21:04, 26 September 2013 (UTC)
+
--[[User:Lars Hellan|Lars Hellan]] 20:28, 20 October 2014 (UTC)
  
  
Lars Hellan, NTNU          
+
November 2023:
  
September 2013
+
'''PLEASE NOTICE THAT THIS SITE NOW HAS A NEW URL:'''
  
 +
https://regdili.hf.ntnu.no/multivalence/parse
  
We here present an implemented design for a multilingual valence database, and its web-demo.
 
  
 +
This webdemo offers verb valence information in a uniform format for four languages (listed in the order of their inclusion), which can be activated in any combination:
  
A multilingual valence database should consist of the following:
+
Norwegian
 +
Ga
 +
Spanish
 +
Bulgarian
  
- The '''Languages''':
 
  
A selection of languages L1... Ln;
+
The '''''web demo''''':
  
- The '''Parameters''':
+
https://regdili.hf.ntnu.no/multivalence/parse
  
A set of specification parameters defined across all the languages (i.e., common parameters, in the sense of being independent of any particular language, although not in the sense of necessarily being relevant for all of the languages);
 
  
- The '''Valence-profiles''':
+
The demo offers 5 drop-down search menus and one write-in field; any combination of them can be activated in a search:
  
For each language, an inventory of its valence types characterized in terms of the parameters available, called its valence-profile;
+
  Write-in-field:          Search menus:
 +
  Verb lexeme           Syntactic Arguments    Function Situation Aspect     Type
  
- The '''Valence-type suites''':
+
The field ’Syntactic Arguments’ – henceforth ‘'''SAS'''’, illustrated in the list snippet (1),
  
For each language, a list of sentences instantiating each of its valence types, indexed according to the types;
+
(1)
 +
        NP+INF                                                               
 +
        NP+INF:equiSBJ                                                       
 +
        NP+INF:raisingSBJ                                                   
 +
        NP+NP                                                               
 +
        NP+NP+APpred
  
- The '''Valence Lexicons''':
+
is based on so-called ‘formal’ syntactic categories, reflecting common analytic assumptions. For Norwegian, the set of possible SAS specifications is currently 158, which is close to being exhaustive at this level of specification. The symbol ‘+’ in (1) stands for linear order. 
  
For each language, a verb lexicon where each verb entry is classified according to its valence type (in addition to other lexical information);
+
The field ’Function’ – henceforth ‘'''FCT'''’ - relates to a more traditional type of descriptive terms, such as ‘intransitive’, ‘transitive’, ‘transitive with oblique’, etc.. They provide less detail in differentiation than the SAS field, thus, for Norwegian, there are currently only 88 FCT term. In contrast to the SAS list, the FCT terms say nothing about linear order.
  
- The '''Valence Corpora''':
+
The fields ‘'''Situation'''’ and ‘'''Aspect'''’ contain situation type and aspectual properties of situations expressed, thus both representing semantic information. Currently these fields
  
For each language, a sentence corpus instantiating each verb in each of the valence frames it can support.
+
In ‘'''Verb lexeme'''’ one writes a verb of the language(s) selected, or an initial substring of a verb, in combination with as many parameter specifications as one wishes. No matter which fields are specified, the result of a search is a number of verbs from the language(s) activated satisfying the conditions given. For each verb in this list, one can prompt a further specification of that verb’s properties: these will include the conditions specified, but also all other properties associated with the verb in the resource present in the database. We illustrate with the typed string ''hoppe''; 'SHOW' is the prompt button:
  
 +
(2)
 +
  Verb lexeme   Syntactic Arguments Function    Situation Aspect Type
 +
  ''hoppe''
 +
 +
  Search Result
 +
  no  SHOW hoppe_intrdir
 +
  no  SHOW hoppe_secpred
 +
  no  SHOW hoppe_secpred-refl
  
The notion ''valence'' represents a perspective from the verb and thus from the Lexicon, whereas from the viewpoint of the sentence and the Corpora, the most closely corresponding term is ''argument structure'', as when we talk about ‘the argument structure of a sentence’; since both perspectives are represented here, we use both terms. The sentential perspective is necessary when not just a single verb determines the argument structure of a sentence, such as when it is determined by a verb plus a secondary predicate, or resides in a series of verbs – the argument structure then results from the interplay between the valence of the constituent items and constructional factors. To widen the scope of the database to fully recognize constructional factors in such cases, we may think of it as a database of argument structure constructions, and use ''Construction-profile'' as an alternative to ''Valence-profile'', and ''Construction-type suites'' as an alternative to ''Valence-type suites''.
+
‘no’ here means ‘Norwegian’, and ‘hoppe_intrdir ‘ is the identifier of the lexical entry in the Norwegian lexical resource. When pushing on SHOW, one sees the full amount of specification stored in the databse for that verb entry; e.g., for hoppe_intrdir:
  
Within the word-based perspective on argument frames, one in turn has to recognize argument structure ''alternation'' or ''derivation'', as when ‘processes’ like passive, causative, applicative etc affect the valence of the verb. As grammatical processes, these are commonly not represented in a Lexicon (except perhaps through marking potential for undergoing the processes), but they are obviously reflected in a Corpus, and it is not implausible that they be reflected in a Valence-profile and a Valence-type suite of a given language.  
+
(3) Lexicon Instance
 +
  Language   no
 +
  Verb Id   hoppe_intrdir
 +
  SAS           NP
 +
  FCT           intransitive
 +
  SIT           directedMotion
 +
  Aspect
 +
  Verb Type   v-intr-suDir
 +
  Example of type gutten løper
 +
  Orthography   < "hoppe" >
 +
  Phon
 +
  Engl-gloss
 +
  Example
 +
  Gloss
 +
  Free-transl
 +
  TypeCraft URLs    [http://typecraft.org/tc2/ntceditor.html#244,2000 Example from TypeCraft] 
  
In sum, in addition to notions like ‘intransitive’, ‘transitive’ and concepts modeled around these, being relevant both from the lexical perspective and the derivation- and construction-based perspectives, parameters to be represented in the database also include derivation, as well as constructional factors like secondary predicates, serial verbs and ‘complex predicates’.  
+
When you click on the link in the last line, you get to an annotated token with ''hoppe'' in the TC editor, with information including what is shown in the figure below:
 +
 
 +
<Phrase>2000</Phrase>
 +
 +
So far rather few of the 12 000 Norwegian verb entries have an annotated example in TypeCraft, but more will come. For Spanish and Bulgarian there are at the moment none, while for Ga, corresponding information is given for each entry under the lines Phon, Engl-gloss, Example, Gloss, Free-transl, imported from a ToolBox project.
 +
 
 +
 
 +
== General Design ==
 +
 
 +
For each language, the information offered is based on a computational grammar of the language including a large lexicon; in each case, the grammatical framework used is HPSG, and the implementation platform is the Language Knowledge Builder (LKB; cf. Copestake 2002).
 +
The basic design and implementation was developed by members of the '''Research Group in Digital Linguistics at NTNU, Trondheim''' (see http://www.ntnu.edu/digital-linguistics
 +
). Main contributors to the present version are, in alphabetic order: Dorothee Beermann, Tore Bruland, Mary Esther Kropp Dakubu (from the Ga grammar), Lars Hellan (from the Norwegian grammar), Montserrat Marimon (from the Spanish grammar), Petya Osenova (from the Bulgarian grammar); see: '''''Hellan, L., D. Beermann, T. Bruland, M.E.K. Dakubu, and M. Marimon (2014) MultiVal: Towards a multilingual valence lexicon. LREC 2014.'''''
 +
 
 +
In principle, a multilingual valence database should consist of the following:
 +
 
 +
  The '''Languages''':
 +
  A selection of languages L1... Ln;
 +
 
 +
  The '''Parameters''':
 +
  A set of specification parameters defined across all the languages
 +
  (i.e., common parameters, in the sense of being independent of any
 +
  particular language, although not in the sense of necessarily being relevant
 +
  for all of the languages);
 +
 
 +
  The '''Valence-profiles''':
 +
  For each language, an inventory of its valence types characterized in terms of
 +
  the parameters available, called its valence-profile;
 +
 
 +
  The '''Valence-type suites''':
 +
  For each language, a list of sentences instantiating each of its valence types,
 +
  indexed according to the types;
 +
 
 +
  The '''Valence Lexicons''':
 +
  For each language, a verb lexicon where each verb entry is classified according to
 +
  its valence type (in addition to other lexical information);
 +
 
 +
  The '''Valence Corpora''':
 +
  For each language, a sentence corpus instantiating each verb in each of
 +
  the valence frames it can support.
 +
 
 +
 
 +
The notion ''valence'' represents a perspective from the verb and thus from the Lexicon, whereas from the viewpoint of the sentence and the Corpora, the most closely corresponding term is ''argument structure'', as when we talk about ‘the argument structure of a sentence’; since both perspectives are represented here, we use both terms. The sentential perspective is necessary when not just a single verb determines the argument structure of a sentence, such as when it is determined by a verb plus a secondary predicate, or resides in a series of verbs – the argument structure then results from the interplay between the valence of the constituent items and constructional factors. To widen the scope of the database to fully recognize constructional factors in such cases, we may think of it as a database of argument structure constructions, and use ''Construction-profile'' as an alternative to ''Valence-profile'', and ''Construction-type suites'' as an alternative to ''Valence-type suites''.
  
 
In general, the parameters selected for inclusion in the database must in the first place be amenable to formalization for a relational database, and in the second place accessible in an understandable form to those who input data and search for data. The latter point is connected to the need for a flexible inventory of terms, on the one hand accommodating terminologies of various frameworks, on the other hand being based on consistent conversion systems between terminologies.
 
In general, the parameters selected for inclusion in the database must in the first place be amenable to formalization for a relational database, and in the second place accessible in an understandable form to those who input data and search for data. The latter point is connected to the need for a flexible inventory of terms, on the one hand accommodating terminologies of various frameworks, on the other hand being based on consistent conversion systems between terminologies.
Line 47: Line 119:
 
From the viewpoint of standard linguistic adequacy, the following factors may be expected in the representation of argument structure:
 
From the viewpoint of standard linguistic adequacy, the following factors may be expected in the representation of argument structure:
  
(1)
+
(4)
 +
  a. syntactic argument structure, i.e., whether there is a subject, an object, a   
 +
  second/indirect object, etc., referred to as grammatical functions, and the formal
 +
  categories carrying them;
 +
  b. semantic argument structure, that is, how many participants are present in the
 +
  situation  depicted, and which roles they play (such as ‘agent’, ‘patient’, etc.);
 +
  c. linkage between syntactic and semantic argument structure, i.e., which grammatical
 +
  functions express which roles, and possible roles not expressed; here also belong
 +
  identity relations, part-whole relations, etc., between arguments;
 +
  d. aspect and Aktionsart, that is, properties of a situation expressed by a sentence
 +
  with the valence in question in terms of whether it is dynamic/stative,
 +
  continuous/instantaneous, completed/ongoing, etc.;
 +
  e. type of the situation expressed, in terms of some classificatory system.
  
a. syntactic argument structure, i.e., whether there is a subject, an object, a  second/indirect object, etc., referred to as grammatical functions, and the formal categories carrying them;
+
Some, but not all, of these factors are represented in the present database.
 +
It is derived from independently existing resources, so that for each language in the database the information stored depends on the resources already existing for that language.
  
b. semantic argument structure, that is, how many participants are present in the situation depicted, and which roles they play (such as ‘agent’, ‘patient’, etc.);
+
== The resources ==
  
c. linkage between syntactic and semantic argument structure, i.e., which grammatical functions express which roles, and possible roles not expressed; here also belong identity relations, part-whole relations, etc., between arguments;
+
The current content of the database is constructed from implemented HPSG grammars  using the DELPH-IN grammar engineering resources including the computational platform LKB (Copestake 2002). The construction takes into account the lexicon files of these grammars, and a conversion script with rewrite rules of the form in (5) below. The leftmost item in this rule is a lexical type, which, as standard in a typed feature structure system, reflects both grammatical and lexical properties. The rule rewrites the type symbol ‘v-ditr’, which essentially means ‘ditransitive headed by verb’, into the SAS counterpart ‘NP+NP+NP’ and the semantic specification of a three-place relation.  
  
d. aspect and Aktionsart, that is, properties of a situation expressed by a sentence with the valence in question in terms of whether it is dynamic/stative, continuous/instantaneous, completed/ongoing, etc.;
+
(5)
 +
  v-ditr =>  SAS: “NP+NP+NP”
 +
            FCT: ditrans    
 +
            SIT: ternaryRel
  
e. type of the situation expressed, in terms of some classificatory system.
+
This rewrite instruction will be part of the conversion rule for v-ditr in all of the languages; however, for each language, reflecting its grammar and lexicon, and the resources available in the database, the instructions supplementing the part in (5) may be different.  
 +
In the Ga LKB grammar, for instance, verb entries are of the form instantiated in (6), reflecting their provenance from a Toolbox project:
  
 +
(6)
 +
  bɔle_85 := v-ditr &
 +
  [STEM <"bɔle">,
 +
  PHON <"bɔ̀lè">,
 +
  ENGL-GLOSS <"expect">,
 +
  SYNSEM.LKEYS.KEYREL.PRED "_bɔle_v-ditr_rel",
 +
  EXAMPLE "Wɔ-bɔle-ee bo nakai",
 +
  GLOSS "1P.AOR-go.around-NEG.IMPERF 2S that",
 +
  FREE-TRANSL "we didn't expect it of you, that you would behave in that manner."].
  
Some, but not all, of these factors are represented in the present database.
+
Entries in the other grammars lack PHON, ENGL-GLOSS and the three lines corresponding to a glossed example.  
A display of the database is given in the web-demo:
+
  
http://regdili.idi.ntnu.no:8080/multilanguage_valence_demo/multivalence
+
A difference between the Spanish and the Norwegian grammar is that the former employs optionality marking in its lexical types, so that one and the same entry in the Spanish lexical resource can represent, e.g., both a transitive and an intransitive frame, whereas in the Norwegian (and Ga) resource, an entry represents only one frame. This is reflected in the SAS and FCT inventories of terms, in that for some Spanish verbs, the entry identifier of one single verb entry may be matched to two or more SASs, possibly corresponding to a FCT label including the part 'Opt', whereas for Norwegian and Ga, a given entry identifier corresponds to only one SAS.  
  
Here is a practical survey of the types of information offered in this web-demo. It currently offers three languages, which can be activated simultaneously, as indicated, or one by one:
+
The current list of conversion rules for Norwegian is given in
 +
'''[[Media:ConversionListNo.txt|ConversionListNo ]]'''; the list of SAS specifications for Norwegian is given in [[Media:SAS types No.txt|SAS types No]], and the list of FCT specifications for Norwegian is given in [[Media:Funct types No.txt|Funct types No]].
  
 +
The valence profile for Norwegian is given in [[Valence Profile Norwegian]], and the valence profile for Ga in [[Ga Valence Profile]]. For both languages, the system used for defining ''Type'' is the ''Construction Labeling'' system, cf. [[Verbconstructions cross-linguistically - Introduction]].
  
Languages:
 
  
Norwegian
 
Ga
 
Spanish
 
  
  
It offers 5 drop-down menus and one write-in field; any combination of them can be activated in a search:
+
[[File:PDFdownload.jpeg|thumb|50px|left|[[Media:Norwegian infinitive constructions Sept082014.pdf| Hellan, L.: Infinitive construction types in Norwegian. An inventory. Sept. 04, 2014 ''']]]]
  
  Search fields:
 
  
  V-key   Syntactic Arguments Function Situation Aspect     Type
+
'''Download information from the Norwegian Valence project.'''
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
====[http://swl-6.wikidot.com/ Syntax of the World's Languages VI]====
 +
 
 +
 
 +
Download '''Lars Hellan and Dorothee Beermann ''''s presentation on '''Infinitive constructions in Norwegian, in a comparative perspective '''
 +
 
 +
[[File:PDFdownload.jpeg|thumb|80px|left|[[Media:Pavia_presentation.pdf| DOWNLOAD]]]]
 +
 
  
  
The field ’Syntactic Arguments’ – henceforth ‘SAS’, illustrated in the list snippet (2),
 
  
(2)
 
NP+INF                                                               
 
        NP+INF:equiSBJ                                                       
 
        NP+INF:raisingSBJ                                                   
 
        NP+NP                                                               
 
        NP+NP+APpred
 
  
is based on so-called ‘formal’ syntactic categories, reflecting common analytic assumptions. For Norwegian, the set of possible SAS specifications is currently 158, which is close to being exhaustive at this level of specification. The symbol ‘+’ in (2) stands for linear order. For Ga the number is lower, due to less material present in the database.
 
  
The field ’Function’ – henceforth ‘FCT’ - relates to a more traditional type of descriptive terms, such as ‘intransitive’, ‘transitive’, ‘transitive with oblique’, etc.. They provide less detail in differentiation than the SAS field, thus, for Norwegian, there are currently only 88 FCT term. In contrast to the SAS list, the FCT terms say nothing about linear order.
 
  
The fields ‘Situation’ and ‘Aspect’ contain situation type and aspectual properties of situations expressed, thus both representing semantic information.
 
  
In ‘V-key’ one writes a verb of the language(s) selected, or an initial substring of a verb, in combination with as many parameter specifications as one wishes. No matter which fields are specified, the result of a search is a number of verbs from the language(s) activated satisfying the conditions given. For each verb in this list, one can prompt a further specification of that verb’s properties: these will include the conditions specified, but also all other properties associated with the verb in the resource present in the database. We illustrate with the typed string ''hoppe''; 'SHOW' is the prompt button:
 
  
  
(3)
 
  V-key   Syntactic Arguments Function Situation Aspect Type
 
  hoppe
 
 
  Search Result
 
  no  SHOW hoppe_intrdir
 
  no  SHOW hoppe_secpred
 
  no  SHOW hoppe_secpred-refl
 
  
‘no’ here means ‘Norwegian’, and ‘hoppe_intrdir ‘ is the identifier of the lexical entry in the Norwegian HPSG grammar. When pushing on SHOW, one sees the full amount of specification stored in the databse for that verb entry; e.g., for hoppe_intrdir:
 
  
  
(4)  Lexicon Instance
 
  Language   no
 
  Verb Id   hoppe_intrdir
 
  SAS           NP
 
  FCT           intransitive
 
  SIT           directedMotion
 
  Aspect
 
  Verb Type   v-intr-suDir
 
  Example of type  gutten løper
 
  Orthography   < "hoppe" >
 
  Phon
 
  Engl-gloss
 
  Example
 
  Gloss
 
  Free-transl
 
  
  
 
TO BE CONTINUED
 
TO BE CONTINUED
 +
 +
[[Category:Valence_-_general_and_multilingual]]

Latest revision as of 20:13, 14 November 2023

--Lars Hellan 20:28, 20 October 2014 (UTC)


November 2023:

PLEASE NOTICE THAT THIS SITE NOW HAS A NEW URL:

https://regdili.hf.ntnu.no/multivalence/parse


This webdemo offers verb valence information in a uniform format for four languages (listed in the order of their inclusion), which can be activated in any combination:

Norwegian 
Ga 
Spanish
Bulgarian


The web demo:

https://regdili.hf.ntnu.no/multivalence/parse


The demo offers 5 drop-down search menus and one write-in field; any combination of them can be activated in a search:

 Write-in-field:          Search menus:
 Verb lexeme	          Syntactic Arguments     Function	Situation	Aspect	    Type

The field ’Syntactic Arguments’ – henceforth ‘SAS’, illustrated in the list snippet (1),

(1)

       NP+INF                                                                
       NP+INF:equiSBJ                                                        
       NP+INF:raisingSBJ                                                     
       NP+NP                                                                 
       NP+NP+APpred

is based on so-called ‘formal’ syntactic categories, reflecting common analytic assumptions. For Norwegian, the set of possible SAS specifications is currently 158, which is close to being exhaustive at this level of specification. The symbol ‘+’ in (1) stands for linear order.

The field ’Function’ – henceforth ‘FCT’ - relates to a more traditional type of descriptive terms, such as ‘intransitive’, ‘transitive’, ‘transitive with oblique’, etc.. They provide less detail in differentiation than the SAS field, thus, for Norwegian, there are currently only 88 FCT term. In contrast to the SAS list, the FCT terms say nothing about linear order.

The fields ‘Situation’ and ‘Aspect’ contain situation type and aspectual properties of situations expressed, thus both representing semantic information. Currently these fields

In ‘Verb lexeme’ one writes a verb of the language(s) selected, or an initial substring of a verb, in combination with as many parameter specifications as one wishes. No matter which fields are specified, the result of a search is a number of verbs from the language(s) activated satisfying the conditions given. For each verb in this list, one can prompt a further specification of that verb’s properties: these will include the conditions specified, but also all other properties associated with the verb in the resource present in the database. We illustrate with the typed string hoppe; 'SHOW' is the prompt button:

(2)

 Verb lexeme	   Syntactic Arguments	Function    Situation	Aspect	Type
 hoppe
 Search Result
 no  SHOW hoppe_intrdir
 no  SHOW hoppe_secpred
 no  SHOW hoppe_secpred-refl

‘no’ here means ‘Norwegian’, and ‘hoppe_intrdir ‘ is the identifier of the lexical entry in the Norwegian lexical resource. When pushing on SHOW, one sees the full amount of specification stored in the databse for that verb entry; e.g., for hoppe_intrdir:

(3) Lexicon Instance

 Language	   no
 Verb Id	   hoppe_intrdir
 SAS	           NP
 FCT	           intransitive
 SIT	           directedMotion
 Aspect	 
 Verb Type	   v-intr-suDir
 Example of type  gutten løper
 Orthography	   < "hoppe" >
 Phon	 
 Engl-gloss	 
 Example	 
 Gloss	 
 Free-transl
 TypeCraft URLs    Example from TypeCraft  

When you click on the link in the last line, you get to an annotated token with hoppe in the TC editor, with information including what is shown in the figure below:

gutten hopper
“the boy jumps”
gutten
gutten
boySBJDEFSGCOMM
CN
hopper
hopper
jumpPRES
Vitr


So far rather few of the 12 000 Norwegian verb entries have an annotated example in TypeCraft, but more will come. For Spanish and Bulgarian there are at the moment none, while for Ga, corresponding information is given for each entry under the lines Phon, Engl-gloss, Example, Gloss, Free-transl, imported from a ToolBox project.


General Design

For each language, the information offered is based on a computational grammar of the language including a large lexicon; in each case, the grammatical framework used is HPSG, and the implementation platform is the Language Knowledge Builder (LKB; cf. Copestake 2002). The basic design and implementation was developed by members of the Research Group in Digital Linguistics at NTNU, Trondheim (see http://www.ntnu.edu/digital-linguistics ). Main contributors to the present version are, in alphabetic order: Dorothee Beermann, Tore Bruland, Mary Esther Kropp Dakubu (from the Ga grammar), Lars Hellan (from the Norwegian grammar), Montserrat Marimon (from the Spanish grammar), Petya Osenova (from the Bulgarian grammar); see: Hellan, L., D. Beermann, T. Bruland, M.E.K. Dakubu, and M. Marimon (2014) MultiVal: Towards a multilingual valence lexicon. LREC 2014.

In principle, a multilingual valence database should consist of the following:

 The Languages:
 A selection of languages L1... Ln;
 The Parameters:
 A set of specification parameters defined across all the languages 
 (i.e., common parameters, in the sense of being independent of any 
 particular language, although not in the sense of necessarily being relevant 
 for all of the languages);
 The Valence-profiles:
 For each language, an inventory of its valence types characterized in terms of 
 the parameters available, called its valence-profile;
 The Valence-type suites:
 For each language, a list of sentences instantiating each of its valence types, 
 indexed according to the types;
 The Valence Lexicons:
 For each language, a verb lexicon where each verb entry is classified according to 
 its valence type (in addition to other lexical information);
 The Valence Corpora:
 For each language, a sentence corpus instantiating each verb in each of 
 the valence frames it can support.


The notion valence represents a perspective from the verb and thus from the Lexicon, whereas from the viewpoint of the sentence and the Corpora, the most closely corresponding term is argument structure, as when we talk about ‘the argument structure of a sentence’; since both perspectives are represented here, we use both terms. The sentential perspective is necessary when not just a single verb determines the argument structure of a sentence, such as when it is determined by a verb plus a secondary predicate, or resides in a series of verbs – the argument structure then results from the interplay between the valence of the constituent items and constructional factors. To widen the scope of the database to fully recognize constructional factors in such cases, we may think of it as a database of argument structure constructions, and use Construction-profile as an alternative to Valence-profile, and Construction-type suites as an alternative to Valence-type suites.

In general, the parameters selected for inclusion in the database must in the first place be amenable to formalization for a relational database, and in the second place accessible in an understandable form to those who input data and search for data. The latter point is connected to the need for a flexible inventory of terms, on the one hand accommodating terminologies of various frameworks, on the other hand being based on consistent conversion systems between terminologies.

From the viewpoint of standard linguistic adequacy, the following factors may be expected in the representation of argument structure:

(4)

 a. syntactic argument structure, i.e., whether there is a subject, an object, a     
 second/indirect object, etc., referred to as grammatical functions, and the formal 
 categories carrying them; 
 b. semantic argument structure, that is, how many participants are present in the 
 situation  depicted, and which roles they play (such as ‘agent’, ‘patient’, etc.);
 c. linkage between syntactic and semantic argument structure, i.e., which grammatical 
 functions express which roles, and possible roles not expressed; here also belong 
 identity relations, part-whole relations, etc., between arguments;
 d. aspect and Aktionsart, that is, properties of a situation expressed by a sentence 
 with the valence in question in terms of whether it is dynamic/stative, 
 continuous/instantaneous, completed/ongoing, etc.; 
 e. type of the situation expressed, in terms of some classificatory system.

Some, but not all, of these factors are represented in the present database. It is derived from independently existing resources, so that for each language in the database the information stored depends on the resources already existing for that language.

The resources

The current content of the database is constructed from implemented HPSG grammars using the DELPH-IN grammar engineering resources including the computational platform LKB (Copestake 2002). The construction takes into account the lexicon files of these grammars, and a conversion script with rewrite rules of the form in (5) below. The leftmost item in this rule is a lexical type, which, as standard in a typed feature structure system, reflects both grammatical and lexical properties. The rule rewrites the type symbol ‘v-ditr’, which essentially means ‘ditransitive headed by verb’, into the SAS counterpart ‘NP+NP+NP’ and the semantic specification of a three-place relation.

(5)

 v-ditr =>  SAS: “NP+NP+NP”
            FCT: ditrans	     
            SIT: ternaryRel

This rewrite instruction will be part of the conversion rule for v-ditr in all of the languages; however, for each language, reflecting its grammar and lexicon, and the resources available in the database, the instructions supplementing the part in (5) may be different. In the Ga LKB grammar, for instance, verb entries are of the form instantiated in (6), reflecting their provenance from a Toolbox project:

(6)

 bɔle_85 := v-ditr & 
 [STEM <"bɔle">,
 PHON <"bɔ̀lè">,
 ENGL-GLOSS <"expect">,
 SYNSEM.LKEYS.KEYREL.PRED "_bɔle_v-ditr_rel",
 EXAMPLE "Wɔ-bɔle-ee bo nakai",
 GLOSS "1P.AOR-go.around-NEG.IMPERF 2S that",
 FREE-TRANSL "we didn't expect it of you, that you would behave in that manner."].

Entries in the other grammars lack PHON, ENGL-GLOSS and the three lines corresponding to a glossed example.

A difference between the Spanish and the Norwegian grammar is that the former employs optionality marking in its lexical types, so that one and the same entry in the Spanish lexical resource can represent, e.g., both a transitive and an intransitive frame, whereas in the Norwegian (and Ga) resource, an entry represents only one frame. This is reflected in the SAS and FCT inventories of terms, in that for some Spanish verbs, the entry identifier of one single verb entry may be matched to two or more SASs, possibly corresponding to a FCT label including the part 'Opt', whereas for Norwegian and Ga, a given entry identifier corresponds to only one SAS.

The current list of conversion rules for Norwegian is given in ConversionListNo ; the list of SAS specifications for Norwegian is given in SAS types No, and the list of FCT specifications for Norwegian is given in Funct types No.

The valence profile for Norwegian is given in Valence Profile Norwegian, and the valence profile for Ga in Ga Valence Profile. For both languages, the system used for defining Type is the Construction Labeling system, cf. Verbconstructions cross-linguistically - Introduction.




Download information from the Norwegian Valence project.






Syntax of the World's Languages VI

Download Lars Hellan and Dorothee Beermann 's presentation on Infinitive constructions in Norwegian, in a comparative perspective








TO BE CONTINUED