Help:How to annotate in TypeCraft - a practical guide
|This page is currently under development.Thank you for your patience.|
- 1 TypeCraft 2.0 Editor
- 1.1 Using the TC2 editor for texts that have been created with the TC1 editor
- 1.2 Create a new text with the TypeCraft 2.0 Editor
- 1.2.1 Select text units for annotation
- 1.2.2 Prepare text for sentence-by-sentence annotation
- 1.2.3 Annotation
- 1.2.4 Annotation for Valence using the Valence description template
- 1.3 Delete Phrases from your list of initiated sentences.
- 1.4 Export of data
- 1.5 Share your text with a group
TypeCraft 2.0 Editor
Since the fall of 2014, TypeCraft offers a new editor which allows a better integration of text-and sentence-level annotations. While text-level annotation allows for the annotation of discourse properties, here called Discourse senses, or simply Senses, the sentence-level annotation allows for Valence annotations on the global-level and for lexical and functional annotation on the words and morpheme level.
For more information about layered annotations in TypeCraft go to: Multi-level_linguistic_annotation_with_TypeCraft
Consider the screenshot of the TypeCraft 2.0 editor (TC2-editor) below
In the background of the TC2 - editor screenshot to the left, we see the editor's text area which contains green text, that is, not yet instantiated phrases and blue text, that is, instantiated phrases.
To the right of the text area, we also see the Metadata input matrix. We offer at present two Metadata templates for the input of Metadata information.
In the foreground of the screenshot (to the left), the Sense annotation viewer is shown. It visualised Sense scope though colour coding. The sense names appear when pointing at words or text fragments to which senses have been assigned.
The tabular Interlinear Glossed Text (IGT) editor (shown in the centre of the screenshot) shows the different levels of annotation: Valence annotations are visualised as strings of annotations which are shown above the annotation table. Word or morpheme level annotation are hosted on specific tiers, and thus constitute the core unit of the tabular annotation editor.
The TypeCraft 1.0 editor will be available until summer 2015 which means that there is ample time to convert texts which have been created in the old editor to the new editor.
The next section describes how TypeCraft adds you to convert texts that have been created in the old editor to the new editor.
Using the TC2 editor for texts that have been created with the TC1 editor
When your open a text that contains or consists of material annotated in the TC1 editor TypeCraft will pop-up the following window:
TypeCraft has been upgraded with a new editor. Since your text was created in the old editor, TypeCraft has to convert it to the new format before opening it in the new editor. Note that the conversion cannot be reversed. To learn more about the new editor please visit the TypeCraft QuickStart page.
You then can choose, by pressing YES or NO, whether you want to keep your text in the old editor or convert it to the new one.
When pressing YES the text will open in the TC2-editor, otherwise the text will open in the old editor. We will maintain the old editor until summer 2015.
Create a new text with the TypeCraft 2.0 Editor
You enter the TC2 editor by clicking on New text one the TypeCraft navigation bar. The editor opens a dialogue window with the following message:
Your text will use the new and better TC editor. If for some reason you want to use the old editor choose No below.
We recommend that you continue at this point in order to annotate your text with the TypeCraft 2.0 Editor.
TypeCraft uses the term "Text" in the above system message to refer to any piece of digital text which is not yet linguistically annotated. To import already annotated text into TypeCraft, you use the TypeCraft Converter.
In order to annotate your text in the new editor, you respond to the system message by pressing "yes".
The editor's text area opens (see screenshot to the left). To the right of the text area, you now find the Metadata template. We offer a default Metadata template and a template for the Norwegian Centre for Writing and Writing Research. (You find a *Change Metadata set* bottom at the end of the template.) New metadata templates can be requested from the TypeCraft system administrator.
You start your work with the editor by filling the text area with the text that you would like to annotate either by copy & pasting a text that you already have, or simply by creating a new text in the text area. Any plain text using any script can be used, but notice that material loaded to TypeCraft must be unicode encoded. Concerning non-latin scripts, we offer automatic Pinyin transliteration for Mandarin Chinese.
Before starting the tokenization of your text, you must select the text's language in the metadata template. You can provide metadata information at any time, and although it might be wise to fill in some information at the onset of your work with a text, it is always possible to come back and fill in missing information.
Back to the text area you now define what we call Phrases for annotation. TypeCraft phrases might be sentences or fragments.
Notice that text does not need to be annotated continuously (sentence by sentence), although that is also possible.
Select text units for annotation
You select a sentence or a fragment (a text string not corresponding to a sentence) for annotation by highlighting it. TypeCraft does not expect that a selected element spans over two lines which are separated by a hard return. If you highlight a phrase that spans over two sentences, covering two separated lines and then select *New Phrase* in order to list this element on your phrase list, the system will generate a message with states:
Alert: Phrases may not span multiple paragraphs.
You can then edit the selected element such that it does not contain a hard return, and then press New Phrase again. Now the fragment will be put into the Phrase list, and in the text area it then appears in green which means it is instantiated and ready for annotation.
Instantiated phrases are either green (when not annotated yet) or blue (when annotated). Neither type can be highlighted again to be assigned to a different phrase. In order to change the instantiated phrase tokenization of your text, you have to go to your *View Phrase list* and select the phrase(es) that you want to reassign. You then select the option DELETE from the Phrase tab at the upper tool bar. 'DELETE' means that you delete the instantiated phrase(s), not the phrase as such which is part of your text,
After you have selected DELETE, the deleted phrase(s) will appear in your text in black again, and you can reassign it/them.
Prepare text for sentence-by-sentence annotation
When you wish to annotate a text sentence by sentence, here is how your proceed: Select *New Phrase* from your editor's tool bar without having selecting a segment by highlighting it. TypeCraft will respond:
You haven't selected any text. Would you like to use the whole text and detect the phrases automatically?
You confirm by selecting "Yes".
The part of your text that was not instantiated yet (text in black) will now appear in green, and on top of your editor screen you see the following message:
TypeCraft has detected new phrases in the text body.They are highlighted this way. If you are satisfied with them just click Create phrases. If there is any phrase you don't want created just click on it so that its highlighting changes to this [highlighted in grey].
After you have applied *Create phrases* you can inspect the result of the automatic sentence tokenization in *View phrase list*.
Now it is time for starting with the core annotation. You can do that in two ways. You double-click on one of the instantiated elements in the text area (those sentence in green colour), or you open the Phrase list by clicking on *View Phrase list*.
When the phrase opens a dialogue window pops-up with the following message:
TypeCraft wants to know which of the following options you prefer: 1. to insert full forms into the table directly (recommended). Choosing this option allows you to separate affixes from word stems in the input mask below. Insert hyphens "-" or spaces " " to indicate morph or word boundaries and then click OK. 2. to manually insert words from your phrase into the table. For this option, click *Cancel* and an empty table will appear.
The tabular annotation editor
After you have decided whether you would like to work with a pre-filled table which realises your choices of morph boundaries, or you would like to start with a clean table, the tabular editor will open, respectively, pre-filled or empty.
You annotate by navigating through the table. We recommend that you add annotations to the tiers vertically by making use of the space bar. This method is in our experience the fastest.
To learn more about the tiers and annotation tags and levels go to Multi-level linguistic annotation with TypeCraft
The *WORD* and the *MORPH* tiers feature a menu bar which allows you to modify existing entries. The menu bar appears when you activate the field you want to change. From the menu you can also add words or change the word's segmentation. *Gloss* and *POS* tags are chosen from a predefined list. You find an overview over all Gloss and POS tags on your navigation bar. These lists are auto-generated and can be ordered by category at your convenience. The lists also provide short definitions for each tag.
The annotation table is supplemented by a large note field. Notice that also the content of the note field can be searched, and if you for example use a designated marker to flag sentences that you would like to target by a search, this can be done easily. Is the annotation of a phrase questionable, you could add a question mark to the Note field. A search for "?" in the note field will then allow you to target by a search only sentences with questionable annotations.
Annotation for Discourse Senses
The new TypeCraft editor allows its users to annotate text fragments in the context of a text, and to annotate words or word sequences for discourse senses (here called for short Senses). For the QuickStart, we describe how you can define text fragments of different length, assign discourse senses, indicate their scope and view them in a text.
To learn more about the discourse senses in TypeCraft go to Multi-level linguistic annotation with TypeCraft
After you have added a sense tier you can assign a sense tag. In the screenshot to the left, in the leftmost field, the sense tier appears in yellow when you activate it, which you do by clicking on it. Sense tags must be chosen from a list of predefined tags (see sense tag list in the navigation bar). Available tags appear in the drop-down menu which is activated as soon as you start typing into it, as indicated in the screenshot. Here you see the letters "As" , and below he drop-down lists with possible completions: ASSdesc and ASSnar are sense tags that start with "As".
Senses may span over individual or several words. Use your mouse, press the left mouse button, and mark the scope of the sense in its tier. The scope is marked by colour. In the screenshot to the left you see a blue tier. It indicates the scope length of the "ASSERTION" senses which the user was about to choose when we made the screenshot. The red tier is the scope of a KEYTERM, while the green line indicates the scope of a SPEC(ification). (The two latter sense tags are not visible in the screenshot because of the active drop-down menu for the ASSERTION).
Annotation for Sense in long sentences
The tabular editor window has a scroll bar at the bottom which is useful when working with long sentences. When annotating sentence and indicating the sense scope by colour-coding, first apply colour to the visible part of your sentence, then scroll to the right. Now press first the CTRL (or Command key for the Mac) and continue highlighting.
This functions allows you to view the sense annotations of a text by means of colour coding.
Sense annotations consist of a sense tag and a coloured bar that indicates the scope of the sense.
View senses reflects the scope of a discourse sense through coloured lines. Colours are not tied to
specific senses, but assigned freely. Instead, when pointing to a word or a text fragment that instantiates a sense,
the sense name becomes visible in the left upper corner of the *View senses* window.
This is illustrated in the screenshot to the right.
Annotation for Valence using the Valence description template
You enter the valence annotation mode from the tabular editor by pressing the * Change* to the right of the label Valence which you find above the word- and morph-level annotation table.
An additional annotation window, as shown in the screenshot to the left, appears and allows you to specify valency attributes using a predefined vocabulary.
While the valence annotation schema is still under development, we allow at this point the input of the following attributes:
- Syntactic Argument Structure * Salient Sentence Pattern
- Situation Type * Force & Eventuality
- Diathesis * Modality
- Adjunct of Interest * Sentence Aspect
Each of these attributes has a set of possible values. Some of the values for the attribute Syntactic Argument Structure are shown to the the right in the way they appear in the drop-down menu.
More about Valence annotation in TypeCraft can be found under:
To learn more about Valence annotation in TypeCraft go to Multi-level linguistic annotation with TypeCraft
When finished with the annotation of valency you return to your tabular annotation editor, where the valence values now appear as a hyphenated string (as shown in the screenshot to the left)
exposing the valency specifications that you have chosen for the phrase under annotation. This is illustrated in the screenshot to the left; the Valence annotations is highlighted in yellow.
Delete Phrases from your list of initiated sentences.
Not all sentences of a text need to be prepared for annotation. When you initiate sentences they will appear in a list which you find under View Phrase List. You can go through the sentences list and select sentences for annotation in the annotation table where you assign glosses in a morpheme-to-morpheme fashion. You can always delete sentence from this list by first marking them in your list. You then go to the Toolbar and select Phrases and from Phrases you select Delete Phrases. A dialog window will open. Follow the instructions! A deleted sentence will still be part of your text where it appear now in black (rather than in blue).
Export of data
Export of IGT
The TypeCraft Editor as well as the TypeCraft search-interface allow for several forms of export. In the editor go to the Phrase tab on the tool bar. The following IGT export functions are available:
- Export to the TCwiki
- Export to HTML (allows export to standard text editors)
- Export to LaTEX (experimental)
In addition, we offer two versions of XML export from TypeCraft. Phrases (according to the user's selection) can be exported together(or without) the text that contains them.
Export to the TCwiki
For a description of the export to the TCwiki, please follow this link: Export to the TC wiki.
Export to your text editor
The export to WORD or Open Office is done through several simple steps: Go to the TypeCraft Editor by opening *My texts*, select the text from which you would like to export phrases.
Go to the* View Phrase list* tab on the tool bar of the editor. Mark them in the check box on the left of the list. Go to the tool tab Phrases and from there to -> Export -> HTML (as shown in the screenshot to the left). You have the choice of exporting IGT (tabular format) with or without border. Make your choice and save the Tc-export file to your computer. (You will see a small pop-up window that asks you to either open or save the Tc-export file. You should save the file if you want to open it in your text editor.)
When you open the exported file on your machine, the default option that you will be presented with is to open the file in your browser (for example Firefox,Chrome, IE or Safari). In order to save the Tc-export file to your system's text editor, you have to open the file by choosing the option *Open with* > [ your favourite text editor]. Notice that the imported examples can still be manipulated. You might want to change the font size or highlight certain glosses, add colour or borders.
Export of Wordlists
Word export works relative to text. Wordlists can be exported by going to the Text tab on your editor's tool tabs bar. Word export targets one text at the time, and shows the number of occurrences as well as its Baseform, POS (Part of Speech) the Wordform and and the GLOSS the word received in the text.
This is a feature of your TypeCraft editor that must be set by the system administrator. In your mail to the administrator, please state which name you wish for your group (for example the name of your project or the name of the language you work on), and the TypeCraft user names of all members of the group that you would like to start. The system administrator will create the group for you. This will not take very long, and when the group is created, you can start to assign texts to your group, by selecting your group from the drop-down menu the *Share with group* tab in your TypeCraft editor.