Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "Norwegian HPSG grammar NorSource"

(References)
(References)
 
(75 intermediate revisions by 2 users not shown)
Line 1: Line 1:
See the [http://regdili.idi.ntnu.no:8080/linguisticAce/parse  web demo] of the Norwegian HPSG grammar '''''Norsource'''''.
+
'''NorSource''' (Norwegian HPSG resource grammar) was developed within ''the Digital Linguistics Research Group'' at the Norwegian University of Science and Technology (NTNU),Trondheim, starting in 2001, and still maintained and further developed.  
  
License
+
The application is licensed under the Lesser General Public License For Linguistic Resources.
  
[WWW] Lesser General Public License For Linguistic Resources
 
  
  
===History and Purpose of the grammar===
 
  
''NorSource'' is a so-called ‘deep’ computational grammar (‘DG’) of Norwegian, developed throughout the last 12 years.
+
See the  [http://regdili.hf.ntnu.no:8081/linguisticAce/parse  '''''WEB DEMO''''']  of the Norwegian HPSG grammar.
The grammar has been developed with a view to the following overall desiderata:
+
  
''Desideratum 1. Encoding of Linguistic Meaning''
 
  
As a ‘generic’ information repository, a DG should have a semantic component from which a Reasoning capacity ideally could be deduced for any domain of discourse – possibly with addition of concepts for the specific domains. It should be like a Fregean ‘Sinn’, in acting as a function from domains of use to models of interpretation. However, contrary to most artificial ‘reasoning’ devices, a DG must span the full  complexity of a natural language, reflecting the size of its vocabulary and its grammar complexity. In this respect,the DG can also be seen as the materialization of a Generative Grammar, in the original sense of that notion.
 
  
''Desideratum 2. Cross-grammar Generality''
 
  
The content of the DG should to as large an extent as possible be phrased in terms used or alignable with terms used in other grammars and for other languages, thereby enabling linguistic comparison using the DG. By ‘content of the DG’ we mean both the content of the grammar files (formalism, notions used) and the content of its parse productions.
+
==Short history of the grammar==
  
''Desideratum 3. Interoperability''
+
''NorSource'' is a computational grammar of Norwegian, developed through the last 20 years. It has as its formal and theoretical framework ''Head-Driven Phrase Structure Grammar'' (''HPSG'') (Pollard and Sag 1994, Sag et al. 2003), and started as a computational project through the ''LinGO'' initiative at CSLI, Stanford, using the ''LKB platform'' (Copestake 2002), which is a general platform with the format of typed feature-structures (TFS), and has integrated in it a format of semantic representation called ''Minimal Recursion Semantics'' (‘MRS’; cf. Copestake et al. 2005). Before year 2000 there were three grammars in this framework, viz. the ''English Resource Grammar'' ('ERG'), the Japanese grammar 'Jacy', and the German grammar 'GG'. Essential to the development of further grammars of this type was the ''HPSG Grammar Matrix'' (‘the ''Matrix''’; see Bender et al. 2002, 2010), which was mainly based on ERG, and had its first phase of deployment during the EU-project ''DeepThought'' (2002-4).
  
The DG should attain as much interoperability with other applications as possible. In general, what a digital ubiquitous research environment for linguistics should enable is an interconnectivity of data, researchers and processing facilities whereby from any point in an overall structure of components, a contribution can have its ramifications immediately implemented throughout the entire structure. Such interconnectivity will have to be manifested both on an ‘outer’ level enabling data flow and easy access, and on an ‘inner’ level ensuring information exchange from one system component to another. For a DG, thus, its files and productions (parses, etc.) should be transportable to other applications, and the codes in which its files are written should be readable by other applications, or able to  be mapped into other codes.
+
We can distinguish five main phases in the development of NorSource, with some of the key persons involved:
  
''Desideratum 4. Sustainability''
+
  Phase 1, the Grounding phase (2001-03; Lars Hellan, Petter Haugereid),
 +
  Phase 2, the Semantic Expansion phase (2004-07; Lars Hellan, Dorothee Beermann, Ben Waldron),
 +
  Phase 3, the Cross-Linguistic Coding phase (2008-10; Lars Hellan),
 +
  Phase 4, the Deployment and Interoperability phase (2010-15; Lars Hellan, Tore Bruland, Elias Aamot, Mads Sandøy).
 +
  Phase 5, further development and applications (2016-now, Lars Hellan, Tore Bruland).
  
The DG should be in such a format, and be situated in such an over-all environment, that as much as possible of its capacity can be retained, independently of particular persons maintaining it or particular physical environments.
+
Phase 1 resided in the building of a basic core grammar around the Matrix skeleton (using the Matrix versions 0.1 – 0.6, as they developed; this included the MRS system). This stage included the accommodation of a 80,000 entries lexicon imported from the previously established resources TROLL and NorKompLex, where a verb valence code and a code for inflectional paradigms constituted major parts.
  
 +
Phase 2 resided in the development of a fine-grained ontology and computing system of spatial and temporal relations, amenable to grammatical systems across languages and typologies, and a detailed semantics of comparative constructions. The grammar was also used as a part of a small Norwegian-Japanese MT system. In this period, the inflectional system was thoroughly revised. Main publications from this period are: Hellan and Beermann (2004), Bermann et al. (2004), Beermann and Hellan (2005).
  
The first of the desiderata reflects a central concern throughout modern logic and philosophy of language, and in turn linguistics and Artificial Intelligence. Semantics being inevitably the basis for significant progress in cross-linguistic modelling, the desideratum has relevance also for desideratum 2. The grammar to be discussed belongs to a family of DGs whose design quite explicitly caters for this concern. This family of DGs has as its formal and theoretical framework HPSG (Pollard and Sag 1994, Sag et al. 2003), and started as a computational project through the LinGO initiative at CSLI, Stanford, using the LKB platform (Copestake 2002), which is a general platform with the format of typed feature-structures (TFS), and has integrated in it a format of semantic representation called Minimal Recursion Semantics (‘MRS’; cf. Copestake et al. 2005). Before year 2000 there were three grammars in this framework, the English Resource Grammar (ERG), the Japanese grammar Jacy, and a German grammar. Essential to the development of further grammars in the family was the ‘HPSG Grammar Matrix’ (‘the Matrix’; see Bender et al. 2002, 2010), which was mainly based on ERG, and had its first phase of deployment during the EU-project DeepThought (2002-4). The grammar family is currently developed within the frame of the DELPH-IN consortium, and will in the following be referred to as the ‘DELPH-IN grammars’.
+
Phase 3 was devoted to a thorough revision of the valence code, to accommodate a cross-linguistically defined classification system of valence and construction types. Main publications from this period are: Hellan (2008), Hellan and Dakubu (2009, 2010)
  
The DG to be discussed was started in 2001, by linguists versed in Generative Grammar since the late 60ies, and formal semantics (‘Montague Grammar’) since the mid 70ies. From the mid 80ies the group developed a computational lexicon (under the acronym ‘TROLL’, see Hellan et al. 1989), mainly associated with research within ‘consolidated GB’. In the late 90ies the group reoriented itself towards HPSG, and started the DG as part of the LinGO initiative with the LKB platform. The DG was the first grammar to be built on the Matrix, during the EU-project DeepThought (2002-4), and despite never receiving very substantial funding, it has retained a place among the medium-large DELPH-IN grammars. We can distinguish four main phases in its development:
+
Phase 4 has been devoted to the use of the grammar in ‘external’ applications, see below, and in this connection involved updating the soft- and hardware basis of the grammar (in particular with ''ACE''), and the construction of a server carrying web-facilities for the grammar itself, as well as the applications and resources just mentioned. There has been a general regimentation of the grammar code, and adaptation of the semantic system – MRS – according to the ''UTool'' restrictor on MRS objects, enabling the Reasoner mentioned shortly below. (See Bruland 2013). See Hellan and Bruland (2015) for a partial overview.
  
  Phase 1, the Grounding phase (2001-04),
+
=== ‘External’ applications: ===
  Phase 2, the Semantic Expansion phase (2005-07),
+
  Phase 3, the Cross-Linguistic Coding phase (2008-10),
+
  Phase 4, the Interoperability phase (2010-14).
+
  
  
Phase 1 resided in the building of a basic core grammar around the Matrix skeleton (using the Matrix versions 0.1 – 0.6, as they developed; this included the MRS system). This stage included the accommodation of a 80,000 entries lexicon imported from the previously established resources TROLL and NorKompLex , where a verb valence code and a code for inflectional paradigms constituted major parts.
+
A. A ‘Grammar Sparrer’, as described in Hellan et al. 2013, accessed at [[A Norwegian Grammar Sparrer]].  This is a construct along the lines of Bender et al. 2004, and Suppes et al. 2014, falling within the overall initiatives described in Heift and Schultze 2007, where specific types of grammatical mistakes are accommodated by ‘mal-rules’ in an extended ‘mal’-version of the grammar, and parses involving such mal-phenomena are reported to the user as tutoring instructions. This system has been running as a webdemo since 2011.
  
 +
B. A Multilingual Valence repository, based on NorSource and three further LKB grammars: The Spanish Resource Grammar, the Bulgarian grammar BURGER, and a grammar of Ga. Information from these grammars is imported in a uniform format into the repository. In essence, the valence code used in verbal lexical types (cf. 3.2 below) is expanded to alternative and more easily inspectable formats, and the verb lexicons of the languages involved are imported into a database organized according to the newer codes, and searchable in terms of these codes. See web access at [[Multilingual Verb Valence Lexicon]], and for description, Hellan and Bruland 2013, and Hellan et al. 2014.
  
Phase 2 resided in the development of a fine-grained ontology and computing system of spatial and temporal relations, amenable to grammatical systems across languages and typologies, and a detailed semantics of comparative constructions. The grammar was also used as a part of a small Norwegian-Japanese MT system. In this period, the inflectional system was thoroughly revised. Main publications from this period are: Hellan and Beermann (2004), Bermann et al. (2004), Beermann and Hellan (2005).
+
C. An initial version of a POS-tagger of Norwegian, reflecting the lexical inventory of the grammar, which amounts to appx. 90000 lexical entries, and a large number of proper names of various categories. The tagger currently offers all available POS-alternatives for a given word. See web access at http://regdili.hf.ntnu.no:8081/webtagger/tagger.
  
 +
D. A simple Reasoner over movement and spatial information exported from the MRS. (See Bruland 2013.)
  
Phase 3 was devoted to a thorough revision of the valence code, to accommodate a cross-linguistically defined classification system of valence and construction types. Main publications from this period are: Hellan (2008), Hellan and Dakubu (2009, 2010)
+
E. A corpus annotated for valence, aside from morphology and POS, produced through the grammar. (See Hellan et al. 2020.)
  
 +
F. A valence resource '''NorVal''' developed from the verb lexicon of NorSource and later in tandem with it, described in Hellan 2021.
  
Phase 4 can be divided into the following themes:
+
===Some remarks on purposes and adequacy of the grammar===
  
A. Deploying the grammar in ‘external’ applications: a ‘Grammar Sparrer’, as described in Hellan et al. 2013, accessed at [[A Norwegian Grammar Sparrer]].  This is a construct along the lines of Bender et al. 2004, and Suppes et al. 2014, falling within the overall initiatives described in Heift and Schultze 2007, where specific types of grammatical mistakes are accommodated by ‘mal-rules’ in an extended ‘mal’-version of the grammar, and parses involving such mal-phenomena are reported to the user as tutoring instructions. This system has been running as a webdemo since 2011.
+
As a grammar of Norwegian, Norsource's first and foremost purpose is of course ''coverage'' in the sense in which a descriptively adequate generative grammar is supposed to attain coverage, namely through recursive enumeration of all and only the strings that count as grammatical in the language, and assigning each string so recognized a morpho-syntactic and semantic analysis. Correctness of analysis is decided on empirical grounds, using standard linguistic criteria of adequacy.
  
B. Exporting information from the grammar to independent resources:
+
As a computational system, a further desideratum is that the grammar can serve as a component in one or more natural language processing applications, like those mentioned above. From the viewpoint of general linguistics, this concern has the proviso that for a grammar as a ‘generic’ device, its success in the usage domain should in principle be measured by how well it can go into a multitude of applications, not just a single one; this is also the ambition for the present grammar. In this respect the grammar is like a Fregean ‘Sinn’, in acting as a function from domains-of-use to deployable systems.
  
1. A valence bank, which, with the same exporting strategy as for Norwegian, contains also two other languages, constituting the first instance of an in depth Multilingual Valence repository. In essence, the valence code used in verbal lexical types (cf. 3.2 below) is expanded to alternative and more easily inspectable formats, and the verb lexicons of the languages involved are imported into a database organized according to the newer codes, and searchable in terms of these codes. (See Hellan and Bruland 2013, and a web access at [[Multilingual Verb Valence Lexicon]].)
 
  
2. A POS-tagger reflecting the lexical inventory of the grammar, useful for lexical acquisition from new text ( (http://regdili.idi.ntnu.no:8080/webtagger/tagger )).
+
Moreover, the grammar has been developed with a view to the following overall desiderata:
  
3. A simple Reasoner over movement and spatial information exported from the MRS. (See Bruland 2013.)
+
''Cross-grammar Generality''
  
 +
The content of the grammar should to as large an extent as possible be phrased in terms used or alignable with terms used in other grammars and for other languages, thereby enabling linguistic comparison using these grammars.
  
C. Updating the soft- and hardware basis of the grammar (in particular with ACE ), and the construction of a server carrying web-facilities for the grammar itself, as well as the applications and resources just mentioned.
+
''Interoperability''
  
D. General regimentation of the grammar code.
+
The grammar should attain as much interoperability with other applications as possible, manifested both on an ‘outer’ level enabling data flow and easy access, and on an ‘inner’ level ensuring information exchange from one system component to another. Thus, the grammar's files and productions (parses, etc.) should be transportable to other applications, and the codes in which its files are written should be readable by other applications, or able to  be mapped into other codes.
  
E. Adapting the semantic system – MRS – according to the UTool restrictor on MRS objects, enabling the Reasoner mentioned above. (See Bruland 2013)
+
''Sustainability''
  
F. Aligned with the latter point, exploring ways of simplifying the grammar definition code, which in its Matrix shape is a remnant from a far more complex AVM notation system (from Pollard and Sag 1994) than is currently motivated. (See Hellan, forthcoming)
+
The grammar should be in such a format, and be situated in such an over-all environment, that as much as possible of its capacity can be retained, independently of particular persons maintaining it or particular physical environments.
  
 +
''Adequacy of the grammar''
  
Point F is developed along a scheme of building small grammars for the exploration of specific features or alternative designs. A full DG as such is a very ‘stiff’ construct where no change can be made without repercussions elsewhere in the structure, hence no major change is recommended to be undertaken unless it has been explored in a smaller format.
+
The grammar must cover not only ‘core grammar’, but the whole array of constructs that can be used in texts of the language: ‘fragments’, abbreviations, interjections, and much more. It at the same time must attain ‘analytic depth’, which will include at least the following factors of morpho-syntactic structure and parameters of functional and semantic analysis (in addition comes a lexicon of significant size):
 
+
 
+
Through points A and B, phase 4 addresses the third desideratum mentioned above, interoperability, and also the fourth desideratum, sustainability, in that the porting of information from the grammar into other applications is in effect probably the most efficient way of securing a maintained ‘life’ of the information. The grammar files as such will be stored, and their linguistic content and the computational environment will be documented, still it is through being used that information can be consistently improved.
+
Points A and B are at the same time the ‘outreach’ aspects of the development, i.e., the respects in which the grammar makes itself useful. A given usage may well be the full motivation for the development of a grammar, although for DG grammars as ‘generic’ devices, their success in the usage domain should in principle be measured by how well they go into a multitude of applications; this is also the ambition behind the present grammar. The fact that such deployment of the grammar starts taking place seriously only after nearly 10 years of development, is to some extent a matter of coincidence, but attests to the complexity of such an object before it is ready for ‘deployment’.
+
 
+
 
+
More ‘immanently’ speaking, and returning to desideratum 1, the main duty of a DG is to accommodate in ‘analytic depth’ any sentence of the language for which it is defined, covering not only ‘core grammar’, but the whole array of constructs that can be used in texts of the language: ‘fragments’, abbreviations, interjections, and much more. With ‘in analytic depth’ one will normally include at least the following factors of morpho-syntactic structure and parameters of functional and semantic analysis (in addition comes a lexicon of significant size):
+
  
 
   - syntactic argument structure (or valence:  whether there is a subject, an object,  
 
   - syntactic argument structure (or valence:  whether there is a subject, an object,  
Line 87: Line 80:
 
   whether it is dynamic/stative, continuous/instantaneous, completed/ongoing, etc..
 
   whether it is dynamic/stative, continuous/instantaneous, completed/ongoing, etc..
  
If these, and other, factors can be addressed in a notional and formal system so cross-linguistically articulated that a grammar of a language L1, through its formal encoding, exposes a value or set of values relative to the above parameters within a matrix of corresponding values for languages L2, L3, …, etc., then one is on the track of attaining desideratum 2. That is, for every  parameter chosen, each specification within the grammar of L1 for that parameter would project  a value for L1 into a matrix where corresponding values for other languages are represented for that parameter. The field is not remotely close to reaching this stage, but it is again – like for desideratum 1 - an idea that one feels is reasonable. To a certain extent, the multilingual valence base mentioned under phase 4, point A, is an illustration of the point.
+
If these, and other, factors can be addressed in a notional and formal system so cross-linguistically articulated that a grammar of a language L1, through its formal encoding, exposes values relative to the above parameters within a matrix of corresponding values for languages L2, L3, …, etc., then one is on the track of attaining the desideratum of cross-grammar generality. To a certain extent, the multilingual valence base mentioned above under phase 4, point A, is an illustration of this point.
  
 +
''Evaluation''
 +
 +
A standard requirement on NLP applications is that they be amenable to ''evaluation''. However, there exist so far few criteria for measuring how well a grammar as a whole performs with regard to 'analytic depth' or the other desiderata mentioned. The closest a grammar can come currently to being transparent for its adequacy is through certains types of 'self-declaration', of which we mention two:
 +
 +
  - Test suites categorized for what analytic properties their parses should exhibit;
 +
  - Grammar internal categorization systems which carry agreed-upon general definitions,
 +
  the grammar thereby signaling that the constructs categorized have been designed
 +
  with the agreed-upon contents.
 +
 +
Both strategies are followed in the present grammar. The Test suites shown below reflect the performance of the grammar for the phenomena contained in the various suites, and in the grammar internal classification, the encoding of verb types is defined following the system laid out in [[Verbconstructions cross-linguistically - Introduction]].
  
 
===References===
 
===References===
Line 113: Line 116:
  
 
Hellan, L.., L. Johnsen and A. Pitz. 1989. TROLL. Ms., NTNU
 
Hellan, L.., L. Johnsen and A. Pitz. 1989. TROLL. Ms., NTNU
 
Hellan, L. (2008). Enumerating Verb Constructions Cross-linguistically. In Proceedings from COLING Workshop on Grammar Engineering Across frameworks. Manchester. 
 
http://www.aclweb.org/anthology-new/W/W08/#1700 .
 
  
 
Hellan, L. and M.E.K. Dakubu (2009): A methodology for enhancing argument structure specification. In Proceedings from the 4th Language Technology Conference (LTC 2009), Poznan.
 
Hellan, L. and M.E.K. Dakubu (2009): A methodology for enhancing argument structure specification. In Proceedings from the 4th Language Technology Conference (LTC 2009), Poznan.
Line 124: Line 124:
  
 
Hellan, L. T. Bruland, M. Sandøy, E. Aamot. 2013. A Grammar Sparrer for Norwegian. NoDaLiDa, Oslo, 2013.
 
Hellan, L. T. Bruland, M. Sandøy, E. Aamot. 2013. A Grammar Sparrer for Norwegian. NoDaLiDa, Oslo, 2013.
 +
 +
Lars Hellan and Tore Bruland. 2015. A cluster of applications around a Deep Grammar. In: Vetulani et al. (eds) ''Proceedings from The Language & Technology Conference (LTC) 2015'', Poznan.
 +
 +
Hellan, Lars, Dorothee Beermann, Tore Bruland, Tormod Haugland, Elias Aamot. 2020. Creating a Norwegian valence corpus from a deep grammar. In ''Human Language Technology. Challenges for Computer Science and Linguistics. 8th Language & Technology Conferene, LTC2017''. Springer 2020 ISBN 978-3-030-66526-5; web interface: https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus).
 +
 +
Hellan, Lars. 2021. A valence catalogue for Norwegian. In: Loukanova, R. (ed)  ''Natural Language Processing in Artificial Intelligence - NLPinAI 2021''. Studies in Computational Intelligence. Springer.
  
 
Pollard, C. and Sag, I. (1994). Head-Driven Phrase Structure Grammar. Chicago University Press.
 
Pollard, C. and Sag, I. (1994). Head-Driven Phrase Structure Grammar. Chicago University Press.
Line 129: Line 135:
 
Suppes, P, T. Liang, E.E. Macken and D. Flickinger (2014) “Positive technological and negative pre-test-score effects in a four-year assessment of low socioeconomic status K-8 student learning in computer-based Math and Language Arts courses ", Computers & Education, 71, pp. 23-32.
 
Suppes, P, T. Liang, E.E. Macken and D. Flickinger (2014) “Positive technological and negative pre-test-score effects in a four-year assessment of low socioeconomic status K-8 student learning in computer-based Math and Language Arts courses ", Computers & Education, 71, pp. 23-32.
  
===Test suites===
+
== Works with specific reference to NorSource ==
  
Complete test suites for basic verbal constructions are found in
+
 
 +
2014: Hellan, L.,  D. Beermann, T. Bruland, M.E.K. Dakubu, and M. Marimon (2014) MultiVal: Towards a multilingual valence lexicon. LREC 2014.
 +
 
 +
2013 Hellan, L., Tore Bruland, Elias Aamot, Mads H. Sandøy: A Grammar Sparrer for Norwegian. Proceedings of NoDaLiDa 2013.
 +
 
 +
2012: Hellan, L., D. Beermann. Semantics of Spatial Prepositions in the Grammar NorSource. Paper presented at ‘Meaning of P’, Ruhr-Universität Bochum, Nov 2012.
 +
 
 +
2008: Hellan, L. From Grammar-Independent Construction Enumeration to Lexical Types in Computational Grammars. Paper presented at COLING, Workshop on Grammar Engineering Across Frameworks (GEAF) Manchester, August 2008 (http://www.aclweb.org/anthology-new/W/W08/#1700).
 +
 
 +
2007a: Hellan, L. On 'Deep Evaluation' for Individual Computational Grammars and for Cross-Framework Comparison. In: T.H. King and E. M. Bender (eds) Proceedings of the GEAF 2007 Workshop. CSLI Studies in Computational Linguistics ONLINE. CSLI Publications. http://csli-publications.stanford.edu/
 +
 
 +
2007b: Hellan, L. Representing clause-internal binding in an HPSG/LKB grammar. In Branco, A. (ed) Proceedings from DARC 2007 (Discourse Anaphora Resolution Conference), Lagos.
 +
 
 +
2006: Hellan, L. and Dorothee Beermann. Word Sense and Semantic Disambiguation of Constructions in a Deep Processing Grammar. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). Paris, France: European Language Resources Association 2006 ISBN 2-9517408-2-4.
 +
 
 +
2005a: Hellan, L. and Dorothee Beermann. The`specifier' in an HPSG grammar implementation of Norwegian.Proceedings of the 15th NODALIDA conference, Joensuu 2005 ed. by S. Werner Ling@JoY: University of Joensuu electronic publications in linguistics and language technology 1. Joensuu 2006
 +
 
 +
2005b. Hellan, L. Implementing Norwegian Reflexives in an HPSG Grammar. In St. Müller (ed) Proceedings of the 12th International Conference on Head-Driven Phrase Structure Grammar . CSLI Publications, Stanford. (http://csli-publications.stanford.edu/)
 +
 
 +
2005c. Hellan, L. and Dorothee Beermann. Classification of Prepositional Senses for Deep Grammar Applications. In: Valia Kordoni and Aline Villavicencio (eds.): Proceedings of the 2nd ACL-Sigsem Workshop on The Linguistic Dimensions of Prepositions and their Use in Computational Linguistics Formalisms and Applications, Colchester, United Kingdom, ACL-Sigsem, 2005
 +
 
 +
2004a: Dorothee Beermann, Jon Atle Gulla, Hellan, L., and Atle Prange. Trailfinder: a case study in extracting spatial information using deep language processing. In Ton van der Wouden, Michaela Poss, Hilke Reckman, and Crit Cremers (eds) Computational Linguistics in the Netherlands 2004: Selected papers from the fifteenth CLIN meeting, pp. 121-131, Leiden, Netherlands, 2004.
 +
 
 +
2004b: Dorothee Beermann and Hellan, L. A treatment of directionals in two implemented HPSG grammars. In Stefan Müller (ed) Proceedings of the HPSG04 Conference, Katholieke Universiteit Leuven. CSLI Publications, 355-377. (http://csli-publications.stanford.edu/)
 +
 
 +
2004c: Hellan, L., Dorothee Beermann, Berthold Crysmann, Petter Haugereid, Dario Gonella, Daniela Kurz, Giampaolo Mazzini, Oliver Plaehn, and Melanie Siegel. DEEPTHOUGHT deliverable 5.10. Technical report, The DEEPTHOUGHT consortium.
 +
 
 +
2003a: Hellan, L. NorSource: an introduction. Ms, NTNU.
 +
 
 +
2003b: Hellan, L. and P. Haugereid. The NorSource Grammar - an excercise in the Matrix Grammar building design. In: Emily Bender, Dan Flickinger, Frederik Fouvry, and Melanie Siegel (eds) Proceedings of Workshop on Ideas and Strategies for Multilingual Grammar Engineering, ESSLLI 2003.
 +
 
 +
1989, Hellan, L., L. Johnsen and A. Pitz. 1989. TROLL. Ms., Univ. of Trondheim.
 +
 
 +
==Test suites==
 +
 
 +
Test suites for basic verbal constructions are found in
  
 
* [[Verbconstructions Norwegian - intransitive]],
 
* [[Verbconstructions Norwegian - intransitive]],
 
* [[Verbconstructions Norwegian - transitive]],  
 
* [[Verbconstructions Norwegian - transitive]],  
 
* [[Verbconstructions Norwegian - ditransitive and copular]].
 
* [[Verbconstructions Norwegian - ditransitive and copular]].
 +
 +
 +
 +
 +
 +
Download site for a 2015-version of the grammar, situated at the Norwegian National Library, Språkbanken:
 +
 +
http://www.nb.no/sprakbanken/show?serial=sbr-32&lang=en
 +
 +
 +
 +
A file for locative and directional semantics for prepositions and adverbs which can be added to any Matrix style grammar, can be downloaded from here [[File:PrepSemantics for standardMtrx.zip]].
 +
 +
 +
 +
--[[User:Lars Hellan]] 13:55, 14 August 2021 (UTC)
 +
 +
 +
 +
 +
[[Category:Norwegian Grammar]]

Latest revision as of 14:08, 6 September 2021

NorSource (Norwegian HPSG resource grammar) was developed within the Digital Linguistics Research Group at the Norwegian University of Science and Technology (NTNU),Trondheim, starting in 2001, and still maintained and further developed.

The application is licensed under the Lesser General Public License For Linguistic Resources.



See the WEB DEMO of the Norwegian HPSG grammar.



Short history of the grammar

NorSource is a computational grammar of Norwegian, developed through the last 20 years. It has as its formal and theoretical framework Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag 1994, Sag et al. 2003), and started as a computational project through the LinGO initiative at CSLI, Stanford, using the LKB platform (Copestake 2002), which is a general platform with the format of typed feature-structures (TFS), and has integrated in it a format of semantic representation called Minimal Recursion Semantics (‘MRS’; cf. Copestake et al. 2005). Before year 2000 there were three grammars in this framework, viz. the English Resource Grammar ('ERG'), the Japanese grammar 'Jacy', and the German grammar 'GG'. Essential to the development of further grammars of this type was the HPSG Grammar Matrix (‘the Matrix’; see Bender et al. 2002, 2010), which was mainly based on ERG, and had its first phase of deployment during the EU-project DeepThought (2002-4).

We can distinguish five main phases in the development of NorSource, with some of the key persons involved:

 Phase 1, the Grounding phase (2001-03; Lars Hellan, Petter Haugereid), 
 Phase 2, the Semantic Expansion phase (2004-07; Lars Hellan, Dorothee Beermann, Ben Waldron), 
 Phase 3, the Cross-Linguistic Coding phase (2008-10; Lars Hellan), 
 Phase 4, the Deployment and Interoperability phase (2010-15; Lars Hellan, Tore Bruland, Elias Aamot, Mads Sandøy).
 Phase 5, further development and applications (2016-now, Lars Hellan, Tore Bruland).

Phase 1 resided in the building of a basic core grammar around the Matrix skeleton (using the Matrix versions 0.1 – 0.6, as they developed; this included the MRS system). This stage included the accommodation of a 80,000 entries lexicon imported from the previously established resources TROLL and NorKompLex, where a verb valence code and a code for inflectional paradigms constituted major parts.

Phase 2 resided in the development of a fine-grained ontology and computing system of spatial and temporal relations, amenable to grammatical systems across languages and typologies, and a detailed semantics of comparative constructions. The grammar was also used as a part of a small Norwegian-Japanese MT system. In this period, the inflectional system was thoroughly revised. Main publications from this period are: Hellan and Beermann (2004), Bermann et al. (2004), Beermann and Hellan (2005).

Phase 3 was devoted to a thorough revision of the valence code, to accommodate a cross-linguistically defined classification system of valence and construction types. Main publications from this period are: Hellan (2008), Hellan and Dakubu (2009, 2010)

Phase 4 has been devoted to the use of the grammar in ‘external’ applications, see below, and in this connection involved updating the soft- and hardware basis of the grammar (in particular with ACE), and the construction of a server carrying web-facilities for the grammar itself, as well as the applications and resources just mentioned. There has been a general regimentation of the grammar code, and adaptation of the semantic system – MRS – according to the UTool restrictor on MRS objects, enabling the Reasoner mentioned shortly below. (See Bruland 2013). See Hellan and Bruland (2015) for a partial overview.

‘External’ applications:

A. A ‘Grammar Sparrer’, as described in Hellan et al. 2013, accessed at A Norwegian Grammar Sparrer. This is a construct along the lines of Bender et al. 2004, and Suppes et al. 2014, falling within the overall initiatives described in Heift and Schultze 2007, where specific types of grammatical mistakes are accommodated by ‘mal-rules’ in an extended ‘mal’-version of the grammar, and parses involving such mal-phenomena are reported to the user as tutoring instructions. This system has been running as a webdemo since 2011.

B. A Multilingual Valence repository, based on NorSource and three further LKB grammars: The Spanish Resource Grammar, the Bulgarian grammar BURGER, and a grammar of Ga. Information from these grammars is imported in a uniform format into the repository. In essence, the valence code used in verbal lexical types (cf. 3.2 below) is expanded to alternative and more easily inspectable formats, and the verb lexicons of the languages involved are imported into a database organized according to the newer codes, and searchable in terms of these codes. See web access at Multilingual Verb Valence Lexicon, and for description, Hellan and Bruland 2013, and Hellan et al. 2014.

C. An initial version of a POS-tagger of Norwegian, reflecting the lexical inventory of the grammar, which amounts to appx. 90000 lexical entries, and a large number of proper names of various categories. The tagger currently offers all available POS-alternatives for a given word. See web access at http://regdili.hf.ntnu.no:8081/webtagger/tagger.

D. A simple Reasoner over movement and spatial information exported from the MRS. (See Bruland 2013.)

E. A corpus annotated for valence, aside from morphology and POS, produced through the grammar. (See Hellan et al. 2020.)

F. A valence resource NorVal developed from the verb lexicon of NorSource and later in tandem with it, described in Hellan 2021.

Some remarks on purposes and adequacy of the grammar

As a grammar of Norwegian, Norsource's first and foremost purpose is of course coverage in the sense in which a descriptively adequate generative grammar is supposed to attain coverage, namely through recursive enumeration of all and only the strings that count as grammatical in the language, and assigning each string so recognized a morpho-syntactic and semantic analysis. Correctness of analysis is decided on empirical grounds, using standard linguistic criteria of adequacy.

As a computational system, a further desideratum is that the grammar can serve as a component in one or more natural language processing applications, like those mentioned above. From the viewpoint of general linguistics, this concern has the proviso that for a grammar as a ‘generic’ device, its success in the usage domain should in principle be measured by how well it can go into a multitude of applications, not just a single one; this is also the ambition for the present grammar. In this respect the grammar is like a Fregean ‘Sinn’, in acting as a function from domains-of-use to deployable systems.


Moreover, the grammar has been developed with a view to the following overall desiderata:

Cross-grammar Generality

The content of the grammar should to as large an extent as possible be phrased in terms used or alignable with terms used in other grammars and for other languages, thereby enabling linguistic comparison using these grammars.

Interoperability

The grammar should attain as much interoperability with other applications as possible, manifested both on an ‘outer’ level enabling data flow and easy access, and on an ‘inner’ level ensuring information exchange from one system component to another. Thus, the grammar's files and productions (parses, etc.) should be transportable to other applications, and the codes in which its files are written should be readable by other applications, or able to be mapped into other codes.

Sustainability

The grammar should be in such a format, and be situated in such an over-all environment, that as much as possible of its capacity can be retained, independently of particular persons maintaining it or particular physical environments.

Adequacy of the grammar

The grammar must cover not only ‘core grammar’, but the whole array of constructs that can be used in texts of the language: ‘fragments’, abbreviations, interjections, and much more. It at the same time must attain ‘analytic depth’, which will include at least the following factors of morpho-syntactic structure and parameters of functional and semantic analysis (in addition comes a lexicon of significant size):

 - syntactic argument structure (or valence:  whether there is a subject, an object, 
 a  second/indirect object, etc., referred to as grammatical functions); 
 - semantic argument structure, that is, how many participants are present in the situation depicted,
 and possibly also which roles they play (such as ‘agent’, ‘patient’, etc.);
 - linkage between syntactic and semantic argument structure, i.e., which grammatical functions 
 express which roles; - identity relations, part-whole relations, etc., between arguments;
 - aspect and Aktionsart, that is, properties of the situation expressed by a sentence in terms of 
 whether it is dynamic/stative, continuous/instantaneous, completed/ongoing, etc..

If these, and other, factors can be addressed in a notional and formal system so cross-linguistically articulated that a grammar of a language L1, through its formal encoding, exposes values relative to the above parameters within a matrix of corresponding values for languages L2, L3, …, etc., then one is on the track of attaining the desideratum of cross-grammar generality. To a certain extent, the multilingual valence base mentioned above under phase 4, point A, is an illustration of this point.

Evaluation

A standard requirement on NLP applications is that they be amenable to evaluation. However, there exist so far few criteria for measuring how well a grammar as a whole performs with regard to 'analytic depth' or the other desiderata mentioned. The closest a grammar can come currently to being transparent for its adequacy is through certains types of 'self-declaration', of which we mention two:

 - Test suites categorized for what analytic properties their parses should exhibit;
 - Grammar internal categorization systems which carry agreed-upon general definitions, 
 the grammar thereby signaling that the constructs categorized have been designed 
 with the agreed-upon contents.

Both strategies are followed in the present grammar. The Test suites shown below reflect the performance of the grammar for the phenomena contained in the various suites, and in the grammar internal classification, the encoding of verb types is defined following the system laid out in Verbconstructions cross-linguistically - Introduction.

References

Beermann, D and L. Hellan. 2004. A treatment of directionals in two implemented HPSG grammars. In Stefan Müller (ed) Proceedings of the HPSG04 Conference, Katholieke Universiteit Leuven. CSLI Publications /http://csli-publications.stanford.edu/

Bender, E. M., D. Flickinger, and S. Oepen. 2002. The Grammar Matrix: An open-source starter kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the Workshop on Grammar Engineering and Evaluation, Coling 2002, Taipei.

Bender, E. M., D. Flickinger, S. Oepen and A. Walsh (2004). "Arboretum: Using a precision grammar for grammar checking in CALL," in Proceedings of the InSTIL/ICALL Symposium 2004, Venice, Italy.

Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L. and Saleem, S. 2010. Grammar Customization. In Research on Language & Computation, Volume 8, Number 1, 23-72.

Bruland, T. 2013. Building World Event Representations From Linguistic Representations. PhD dissertation, NTNU.

Copestake, A. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford.

Copestake, A., D. Flickinger, I. Sag and C. Pollard. 2005. Minimal Recursion Semantics: an Introduction. Journal of Research on Language and Computation. 281-332.

Heift, T., and M. Schulze. (2007). Errors and Intelligence in Computer-Assisted Language Learning: Parsers and Pedagogues. Routledge, New York.

Hellan, L. 1988. Anaphora in Norwegian and the Theory of Grammar. Kluwer.

Hellan, L. and D. Beermann. 2005. Classification of Prepositional Senses for Deep Grammar Applications. In: Kordoni, V. and A. Villavicencio (eds.).

Hellan, L.., L. Johnsen and A. Pitz. 1989. TROLL. Ms., NTNU

Hellan, L. and M.E.K. Dakubu (2009): A methodology for enhancing argument structure specification. In Proceedings from the 4th Language Technology Conference (LTC 2009), Poznan.

Hellan, L. and M. E. K. Dakubu, 2010: Identifying Verb Constructions Cross-Linguistically. Studies in the Languages of the Volta Basin 6.3. Legon: Linguistics Dept., University of Ghana

Hellan, L. and T. Bruland 2013. Constructing a Multilingual Database of Verb Valence. NoDaLiDa, Oslo, 2013.

Hellan, L. T. Bruland, M. Sandøy, E. Aamot. 2013. A Grammar Sparrer for Norwegian. NoDaLiDa, Oslo, 2013.

Lars Hellan and Tore Bruland. 2015. A cluster of applications around a Deep Grammar. In: Vetulani et al. (eds) Proceedings from The Language & Technology Conference (LTC) 2015, Poznan.

Hellan, Lars, Dorothee Beermann, Tore Bruland, Tormod Haugland, Elias Aamot. 2020. Creating a Norwegian valence corpus from a deep grammar. In Human Language Technology. Challenges for Computer Science and Linguistics. 8th Language & Technology Conferene, LTC2017. Springer 2020 ISBN 978-3-030-66526-5; web interface: https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus).

Hellan, Lars. 2021. A valence catalogue for Norwegian. In: Loukanova, R. (ed) Natural Language Processing in Artificial Intelligence - NLPinAI 2021. Studies in Computational Intelligence. Springer.

Pollard, C. and Sag, I. (1994). Head-Driven Phrase Structure Grammar. Chicago University Press.

Suppes, P, T. Liang, E.E. Macken and D. Flickinger (2014) “Positive technological and negative pre-test-score effects in a four-year assessment of low socioeconomic status K-8 student learning in computer-based Math and Language Arts courses ", Computers & Education, 71, pp. 23-32.

Works with specific reference to NorSource

2014: Hellan, L., D. Beermann, T. Bruland, M.E.K. Dakubu, and M. Marimon (2014) MultiVal: Towards a multilingual valence lexicon. LREC 2014.

2013 Hellan, L., Tore Bruland, Elias Aamot, Mads H. Sandøy: A Grammar Sparrer for Norwegian. Proceedings of NoDaLiDa 2013.

2012: Hellan, L., D. Beermann. Semantics of Spatial Prepositions in the Grammar NorSource. Paper presented at ‘Meaning of P’, Ruhr-Universität Bochum, Nov 2012.

2008: Hellan, L. From Grammar-Independent Construction Enumeration to Lexical Types in Computational Grammars. Paper presented at COLING, Workshop on Grammar Engineering Across Frameworks (GEAF) Manchester, August 2008 (http://www.aclweb.org/anthology-new/W/W08/#1700).

2007a: Hellan, L. On 'Deep Evaluation' for Individual Computational Grammars and for Cross-Framework Comparison. In: T.H. King and E. M. Bender (eds) Proceedings of the GEAF 2007 Workshop. CSLI Studies in Computational Linguistics ONLINE. CSLI Publications. http://csli-publications.stanford.edu/

2007b: Hellan, L. Representing clause-internal binding in an HPSG/LKB grammar. In Branco, A. (ed) Proceedings from DARC 2007 (Discourse Anaphora Resolution Conference), Lagos.

2006: Hellan, L. and Dorothee Beermann. Word Sense and Semantic Disambiguation of Constructions in a Deep Processing Grammar. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). Paris, France: European Language Resources Association 2006 ISBN 2-9517408-2-4.

2005a: Hellan, L. and Dorothee Beermann. The`specifier' in an HPSG grammar implementation of Norwegian.Proceedings of the 15th NODALIDA conference, Joensuu 2005 ed. by S. Werner Ling@JoY: University of Joensuu electronic publications in linguistics and language technology 1. Joensuu 2006

2005b. Hellan, L. Implementing Norwegian Reflexives in an HPSG Grammar. In St. Müller (ed) Proceedings of the 12th International Conference on Head-Driven Phrase Structure Grammar . CSLI Publications, Stanford. (http://csli-publications.stanford.edu/)

2005c. Hellan, L. and Dorothee Beermann. Classification of Prepositional Senses for Deep Grammar Applications. In: Valia Kordoni and Aline Villavicencio (eds.): Proceedings of the 2nd ACL-Sigsem Workshop on The Linguistic Dimensions of Prepositions and their Use in Computational Linguistics Formalisms and Applications, Colchester, United Kingdom, ACL-Sigsem, 2005

2004a: Dorothee Beermann, Jon Atle Gulla, Hellan, L., and Atle Prange. Trailfinder: a case study in extracting spatial information using deep language processing. In Ton van der Wouden, Michaela Poss, Hilke Reckman, and Crit Cremers (eds) Computational Linguistics in the Netherlands 2004: Selected papers from the fifteenth CLIN meeting, pp. 121-131, Leiden, Netherlands, 2004.

2004b: Dorothee Beermann and Hellan, L. A treatment of directionals in two implemented HPSG grammars. In Stefan Müller (ed) Proceedings of the HPSG04 Conference, Katholieke Universiteit Leuven. CSLI Publications, 355-377. (http://csli-publications.stanford.edu/)

2004c: Hellan, L., Dorothee Beermann, Berthold Crysmann, Petter Haugereid, Dario Gonella, Daniela Kurz, Giampaolo Mazzini, Oliver Plaehn, and Melanie Siegel. DEEPTHOUGHT deliverable 5.10. Technical report, The DEEPTHOUGHT consortium.

2003a: Hellan, L. NorSource: an introduction. Ms, NTNU.

2003b: Hellan, L. and P. Haugereid. The NorSource Grammar - an excercise in the Matrix Grammar building design. In: Emily Bender, Dan Flickinger, Frederik Fouvry, and Melanie Siegel (eds) Proceedings of Workshop on Ideas and Strategies for Multilingual Grammar Engineering, ESSLLI 2003.

1989, Hellan, L., L. Johnsen and A. Pitz. 1989. TROLL. Ms., Univ. of Trondheim.

Test suites

Test suites for basic verbal constructions are found in



Download site for a 2015-version of the grammar, situated at the Norwegian National Library, Språkbanken:

http://www.nb.no/sprakbanken/show?serial=sbr-32&lang=en


A file for locative and directional semantics for prepositions and adverbs which can be added to any Matrix style grammar, can be downloaded from here File:PrepSemantics for standardMtrx.zip.


--User:Lars Hellan 13:55, 14 August 2021 (UTC)