Typecraft v2.5
Jump to: navigation, search

Help:How to annotate in TypeCraft - a practical guide

Revision as of 11:31, 9 October 2014 by Dorothee Beermann (Talk | contribs)

TypeCraft 2.0 Editor

Since the fall of 2014, TypeCraft offers a new editor which allows a better integration of text-and sentence-level annotations. While text-level annotation allows for the annotation of discourse properties, here called Discourse senses, or simply Senses, the sentence-level annotation allows for Valence annotations on the global-level and for lexical and functional annotation on the words and morpheme level.

For more information about layered annotations in TypeCraft go to: 
Multi-level_linguistic_annotation_with_TypeCraft

Consider the screenshot of the TypeCraft 2.0 editor (TC2-editor) below

Figure1 Editor - illustrating different annotation levels


In the background of the TC2 - editor screenshot to the left, we see the editor's text area which contains green text, that is, not yet instantiated phrases and blue text, that is, instantiated phrases.

To the right of the text area, we also see the Metadata input matrix. We offer at present two Metadata templates for the input of Metadata information.

In the foreground of the screenshot (to the left), the Sense annotation viewer is shown. It visualised Sense scope though colour coding. The sense names appear when pointing at words or text fragments to which senses have been assigned.

The tabular Interlinear Glossed Text (IGT) editor (shown in the centre of the screenshot) shows the different levels of annotation: Valence annotations are visualised as strings of annotations which are shown above the annotation table. Word or morpheme level annotation are hosted on specific tiers, and thus constitute the core unit of the tabular annotation editor.

The TypeCraft 1.0 editor will be available until summer 2015 which means that there is ample time to convert texts which have been created in the old editor to the new editor.


The next section describes how TypeCraft adds you to convert texts that have been created in the old editor to the new editor.



Using the TC2 editor for texts that have been created with the TC1 editor

When your open a text that contains or consists of material annotated in the TC1 editor TypeCraft will pop-up the following window:


TypeCraft has been upgraded with a new editor.
Since your text was created in the old editor, TypeCraft has to convert it to the new format before opening it in the new editor.
Note that the conversion cannot be reversed.
To learn more about the new editor please visit the TypeCraft QuickStart page.

You then can choose, by pressing YES or NO, whether you want to keep your text in the old editor or convert it to the new one.


When pressing YES the text will open in the TC2-editor, otherwise the text will open in the old editor. We will maintain the old editor until summer 2015.

Create a new text with the TypeCraft 2.0 Editor

You enter the TC2 editor by clicking on New text one the TypeCraft navigation bar. The editor opens a dialogue window with the following message:

Your text will use the new and better TC editor. If for some reason you want to use the old editor choose No below.

We recommend that you continue at this point in order to annotate your text with the TypeCraft 2.0 Editor.

TypeCraft uses the term "Text" in the system message above to refer to any piece of digital text which is not yet linguistically annotated.

In order to annotate your text in the new editor, you respond to the system message by pressing "yes".

the Editor text area-no text loaded yet

The editor's text area opens (see screenshot to the left). To the right of the text area, you the Metadata template becomes accessible. We offer a default Metadata template and a template for the Norwegian Centre for Writing and Writing Research. (You find a *Change Metadata set* bottom at the end of the template.) New metadata templates can be requested by mailto ................

You start your work with the editor by filling the text area with the text that you would like to annotate either by using copy & paste of a text that you already have, or simply by creating a new text in the text ares. Any plain text using any script can be used, but notice that material loaded to TypeCraft must be unicode. Concerning non-latin script, we offer automatic Pinyin transliteration for Mandarin Chinese.


Before starting the tokenization of your text, you must select the text's language in the metadata template. You can provide metadata information at any time, and although it might be wise to fill in some information at the onset of your work with a text, it is always possible to come back and fill in missing information.

Back to the text area you now define what we call Phrases for annotation. TypeCraft phrases might be sentences or fragments.

Notice that text does not need to be annotated continuously (sentence by sentence), although that is also possible.

Select text units for annotation

You select an element for annotation by highlighting it. You then press the New Phrase button which will put the selected element into the Phrase list. In the text area this phrase now appears in green.

Prepare text for sentence-by-sentence annotation

Annotation

Now it is time for starting with the core annotation. You can do that in two ways. You double-click on one of the instantiated elements in the text area (those sentence in green colour), or you open the Phrase list by clicking on *View Phrase list*.

When the phrase opens a dialogue window pops-up with the following message:

 TypeCraft wants to know 
 which of the following options you prefer:
 1. to insert full forms into the table directly (recommended).
      Choosing this option allows you to separate affixes from word stems
      in the input mask below. Insert hyphens "-" or spaces " " to indicate morph 
      or word boundaries and then click OK.
 2. to manually insert words from your phrase into the table.
      For this option, click *Cancel* and an empty table will appear.
The tabular annotation editor

After you have decided whether you would like to work with a pre-filled table which realises your choices of morph boundaries, or you would like to start with a clean table, the tabular editor will open, respectively, pre-filled or empty.

You annotate by navigating through the table. We recommend that you add annotations to the tiers vertically by making use of the space bar. This method is in our experience the fastest.

 To learn more about the tiers and annotation tags and levels go to
         Multi-level linguistic annotation with TypeCraft

The *WORD* and the *MORPH* tiers feature a menu bar which allows you to modify existing entries. The menu bar appears when you activate the field you want to change. From the menu you can also add words or change the word's segmentation. *Gloss* and *POS* tags are chosen from a predefined list. You find an overview over all Gloss and POS tags on your navigation bar. These lists are auto-generated and can be ordered by category at your convenience. The lists also provide short definitions for each tag.

The annotation table is supplemented by a large note field. Notice that also the content of the note field can be searched, and if you for example use a designated marker to flag sentences that you would like to target by a search, this can be done easily. Is the annotation of a phrase questionable, you could add a question mark to the Note field. A search for "?" in the note field will then allow you to target by a search only sentences with questionable annotations.


Annotation for Discourse Senses

The new TypeCraft editor allows its users to annotate text fragments in the context of a text, and to annotate words or word sequences for discourse senses (here called for short Senses). For the QuickStart, we describe how you can define text fragments of different length, assign discourse senses, indicate their scope and view them in a text.

   To learn more about the discourse senses in TypeCraft go to
         Multi-level linguistic annotation with TypeCraft


SenseAnno3.jpg
In order to add discourse senses to your annotations, go to your tabular editor. At the bottom of the annotation table you find the function *Add Discourse Sense*. Clicking the function will add a new tier to your annotation table. In this way, the editor can be extended by three additional tiers for sense annotation. At present we offer an experimental set of annotation tags for discourse annotation which can be accessed from the TypeCraft navigation bar.

After you have added a sense tier you can assign a sense tag. In the screenshot to the left, in the leftmost field, the sense tier appears in yellow when you activate it, which you do by clicking on it. Sense tags must be chosen from a list of predefined tags (see sense tag list in the navigation bar). Available tags appear in the drop-down menu which is activated as soon as you start typing into it, as indicated in the screenshot. Here you see the letters "As" , and below he drop-down lists with possible completions: ASSdesc and ASSnar are sense tags that start with "As".

Senses may span over individual or several words. Use your mouse, press the left mouse button, and mark the scope of the sense in its tier. The scope is marked by colour. In the screenshot to the left you see a blue tier. It indicates the scope length of the "ASSERTION" senses which the user was about to choose when we made the screenshot. The red tier is the scope of a KEYTERM, while the green line indicates the scope of a SPEC(ification). (The two latter sense tags are not visible in the screenshot because of the active drop-down menu for the ASSERTION).




View Senses
ViewSenses.jpg



This functions allows you to view the sense annotations of a text by means of colour coding.

Sense annotations consist of a sense tag and a coloured bar that indicates the scope of the sense.

View senses reflects the scope of a discourse sense through coloured lines. Colours are not tied to

specific senses, but assigned freely. Instead, when pointing to a word or a text fragment that instantiates a sense,

the sense name becomes visible in the left upper corner of the *View senses* window.

This is illustrated in the screenshot to the right.









Annotation for Valence using the Valence description template

You enter the valence annotation mode from the tabular editor by pressing the * Change* to the right of the label Valence which you find above the word- and morph-level annotation table.

An additional annotation window, as shown in the screenshot to the left, appears and allows you to specify valency attributes using a predefined vocabulary.

Valence Annotation Schema

While the valence annotation schema is still under development, we allow at this point the input of the following attributes:

Drop-down window for the attribute Syntactic argument structure
  • Syntactic Argument Structure * Salient Sentence Pattern
  • Situation Type * Force & Eventuality
  • Diathesis * Modality
  • Adjunct of Interest * Sentence Aspect


Each of these attributes has a set of possible values. Some of the values for the attribute Syntactic Argument Structure are shown to the the right in the way they appear in the drop-down menu.


More about Valence annotation in TypeCraft can be found under:

To learn more about Valence annotation in TypeCraft go to Multi-level linguistic annotation with TypeCraft




When finished with the annotation of valency you return to your tabular annotation editor, where the valence values now appear as a hyphenated string (as shown in the screenshot to the left)

The Valence annotation is highlighted in yellow


exposing the valency specifications that you have chosen for the phrase under annotation. This is illustrated in the screenshot to the left; the Valence annotations is highlighted in yellow.




Export of data

Export of IGT

The TypeCraft Editor as well as the TypeCraft search-interface allow for several forms of export. In the editor go to the Phrase tab on the tool bar. The following IGT export functions are available:

  • Export to the TCwiki
  • Export to HTML (allows export to standard text editors)
  • Export to LaTEX (experimental)

In addition, we offer two versions of XML export from TypeCraft. Phrases (according to the user's selection) can be exported together(or without) the text that contains them.

Export to the TCwiki

For a description of the export to the TCwiki, please follow this link: Export to the TC wiki.

Export to your text editor

The export to WORD or Open Office is done through several simple steps: Go to the TypeCraft Editor by opening *My texts*, select the text from which you would like to export phrases.


Export1.jpg

Go to the* View Phrase list* tab on the tool bar of the editor. Mark them in the check box on the left of the list. Go to the tool tab Phrases and from there to -> Export -> HTML (as shown in the screenshot to the left). You have the choice of exporting IGT (tabular format) with or without border. Make your choice and save the Tc-export file to your computer. (You will see a small pop-up window that asks you to either open or save the Tc-export file. You should save the file if you want to open it in your text editor.)

When you open the exported file on your machine, the default option that you will be presented with is to open the file in your browser (for example Firefox,Chrome, IE or Safari). In order to save the Tc-export file to your system's text editor, you have to open the file by choosing the option *Open with* > [ your favourite text editor]. Notice that the imported examples can still be manipulated. You might want to change the font size or highlight certain glosses, add colour or borders.







Export of Wordlists

Word export works relative to text. Wordlists can be exported by going to the


In your TypeCraft Editor go to the tab *Phrase* in the upper left corner of your editor window. When pressing on *Phrase* a drop-down window opens. Select the first option which is: *Export*.




Share your text with a group

This is a feature of your TypeCraft editor that must be set by the system administrator. In your mail to the administrator, please state which name you wish for your group (for example the name of your project or the name of the language you work on), and the TypeCraft user names of all members of the group that you would like to start. The system administrator will create the group for you. This will not take very long, and when the group is created, you can start to assign texts to your group, by selecting your group from the drop-down menu the *Share with group* tab in your TypeCraft editor.



Search

Search is operated from the navigation bar of the TypeCraft wiki. In the second from the top information box labled *typecraft search* you have access to Text search and Phrase search.


Text search

Mandarin Text Wordlist export rendered in Pinyin

Text search allows you to find texts using the Metadata tagset, as well the name of the owner of the text, and when it was last modified, and of course the the language can be specified


Strings and sub-strings from the text title or the title translation can be used directly as search terms.

All levels of annotation from Valence over Sense to Gloss and Part of Speech can be used to select texts that contain them. One or several tags in combination can be specified as search terms, and their scope can be defined

Also strings or sub-string contained in the note field can be used to search for texts.

It further can be specified if the search should only consider the user's own texts only or all public text for a specific language or all available languages.

The search result is displayed order by several columns. The search result specifies the number of phrases found and the instances found specified for each of the search terms.


Search1.jpg


The screenshot to the left shows a partial result for a search of texts that contain thematic annotations. the GLOSS tags BEN(eficiary) and GOAL where used as search terms

and 38 texts were found with 127 instances for the search term GOAL and 154 instances of the search term BEN(eficiary).

From the search result window, the user has direct access to WORD export. Word export targets one text at the time, and shows the number of occurrences as well as its Baseform, POS (Part of Speech) the Wordform and and the GLOSS the word received in the text.