Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "Computational Lexicography"

(Tools)
m (Data conversion)
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
in the context of the '''Legon-Trondheim NUFU-funded Linguistics Project'''
 +
 +
by [[User:Hannes|Hannes Hirzel]]
 +
 +
 
= Tools =
 
= Tools =
 +
 +
== Dictionary production ==
 +
Tools used with the projects to produce a dictionary are:
  
 
* Toolbox [http://www.sil.org/computing/download]
 
* Toolbox [http://www.sil.org/computing/download]
 
* LexiquePro [http://www.lexiquepro.com/]
 
* LexiquePro [http://www.lexiquepro.com/]
* WeSay [http://www.wesay.org] help files http://wesay.org/wiki/Help_And_Contact
+
* WeSay [http://www.wesay.org] help files http://wesay.org/wiki/Help_And_Contact , http://projects.palaso.org/repositories/changes/wesay-doc/WeSay%20Documentation%20Printable.pdf
 
* ATP3 - used in some projects for checking dictionaries and do semantic tagging. (more info: hannes.hirzel@gmail.com)
 
* ATP3 - used in some projects for checking dictionaries and do semantic tagging. (more info: hannes.hirzel@gmail.com)
  
Line 14: Line 22:
  
 
Toolbox has been used in the last 12 years in the NUFU project to produce dictionaries.
 
Toolbox has been used in the last 12 years in the NUFU project to produce dictionaries.
 +
 +
== Data conversion ==
 +
 +
[[Converting a Toolbox lexical database to LKB format]]
 +
 +
http://solid.palaso.org/
 +
 +
== Other Tools ==
 +
* http://www.sil.org/computing/fieldworks/flex/
 +
* http://www-nlp.stanford.edu/kirrkir
 +
* http://kdictionaries.com/
 +
* http://tshwanedje.com/tshwanelex/
  
 
= File formats =
 
= File formats =
Line 43: Line 63:
 
LIFT (Lexicon Interchange FormaT) is an XML format for storing lexical information, as used in the creation of dictionaries. It's not necessarily the format for your lexicon. That can be tied to whatever program you're using. But LIFT allows you to move that data between programs (hence the term 'interchange').
 
LIFT (Lexicon Interchange FormaT) is an XML format for storing lexical information, as used in the creation of dictionaries. It's not necessarily the format for your lexicon. That can be tied to whatever program you're using. But LIFT allows you to move that data between programs (hence the term 'interchange').
  
LIFT is also a decent archiving option. Not because it will be around in 50 years, but because people will still be able to read it with any text editor and easily make use of it, even then. (You think that's true of your non-SOLID Standard Format file? We should have a chat.)
+
LIFT is also a decent archiving option. Not because it will be around in 50 years, but because people will still be able to read it with any text editor and easily make use of it, even then.
  
 
LIFT has been designed to have a long life but also to be relatively easy to convert to and from existing lexicon formats, particularly Multi-Dictionary Formatter (MDF) and FieldWorks Language Explorer.
 
LIFT has been designed to have a long life but also to be relatively easy to convert to and from existing lexicon formats, particularly Multi-Dictionary Formatter (MDF) and FieldWorks Language Explorer.
 
http://code.google.com/p/lift-standard/
 
http://code.google.com/p/lift-standard/
 +
 +
Use of LIFT in WeSay: http://www.wesay.org/wiki/LIFT (a subset)

Latest revision as of 09:07, 25 November 2012

in the context of the Legon-Trondheim NUFU-funded Linguistics Project

by Hannes Hirzel


Tools

Dictionary production

Tools used with the projects to produce a dictionary are:

All these four programs use a file with MDF markup (see below) to encode the dictionary information.

The order of the markers is not fully defined in the MDF convention. LIFT (see below) is an XML format in which MDF formatted dictionaries may be encoded after some fixes. WeSay is a dictionary program which is straightforward to use. It stores the lexical data in the LIFT format. WeSay includes an export function for MDF. LexiquePro may be used to printout a dictionary.

A particular good combination for a new project is WeSay for data entry combined with LexiquePro for printout.

Toolbox has been used in the last 12 years in the NUFU project to produce dictionaries.

Data conversion

Converting a Toolbox lexical database to LKB format

http://solid.palaso.org/

Other Tools

File formats

2008: Conference on Language Resources and Evaluation Trippel et al.: Lexicon schemas: Lexicon Schemas and Related Data Models: when Standards Meet Users

http://www.lrec-conf.org/proceedings/lrec2008/slides/812.pdf

Conclusion:

  • all schemes are implementations of LMF
  • Interchange results in loss of implied information
  • Tools lack support for interchange


MDF

MDF = Multi-Dictionary Formatter MDF was originally a DOS program to printout dictionaries which had markers like \lx for the lexeme, \ge for the gloss English etc.

The way of encoding dictionary information persisted but the programs which do the printout have changed. These days LexiquePro offers good options for printing out a dictionary. You just choose 'File/Open' and point it to an MDF dictionary file. Then you are guided through a series of questions to load the dictionary. Menu 'File/Export' allows you to produce an MSWord or OpenOffice file with the printout.

http://www.sil.org/computing/shoebox/MDF.html

The e-book http://www.sil.org/computing/shoebox/MDF_2000.pdf gives lexicographic principles and explains the markup. http://wiki.lingtransoft.info/tutorials/mdf gives a summary on the use of individual markers.

LIFT

LIFT (Lexicon Interchange FormaT) is an XML format for storing lexical information, as used in the creation of dictionaries. It's not necessarily the format for your lexicon. That can be tied to whatever program you're using. But LIFT allows you to move that data between programs (hence the term 'interchange').

LIFT is also a decent archiving option. Not because it will be around in 50 years, but because people will still be able to read it with any text editor and easily make use of it, even then.

LIFT has been designed to have a long life but also to be relatively easy to convert to and from existing lexicon formats, particularly Multi-Dictionary Formatter (MDF) and FieldWorks Language Explorer. http://code.google.com/p/lift-standard/

Use of LIFT in WeSay: http://www.wesay.org/wiki/LIFT (a subset)