Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "Parallel Annotation of Speech and Text"

(Project Description)
 
(52 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<span style="color:red"> '''This page is under construction'''</span>
 
 
 
== Project Description==
 
== Project Description==
Goal of this short pilot has been parallel sound and text annotation. The study has been conducted by Professor [[User:Wim van Dommelen|Wim van Dommelen]] and Assc.Professor [[User:Dorothee Beermann|Dorothee Beermann]]at the [http://www.ntnu.no/isk Institute of Languages and Communication Studies] at the [http://www.ntnu.no Norwegian University of Science and Technology].Scientific assistant for the project was [[User:Asger Hagerup|Asger Hagerup]]. The project has been funded by [http://www.ntnu.no/hf/satsing/sstl the SSTL].
 
  
The pilot investigated integrated presentations of linguistically annotated audio and text material, combining [http://www.fon.hum.uva.nl/praat/ Praat) and TypeCraft.
+
NTNU project 2010
  
'''Praat''' is a signal analysis software developed by Paul Boersma and David Weenink of the University of Amsterdam. It is a tool widely used for the annotation of sound objects.For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object.Specifying a sentence tier allowed us easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories.  
+
Goal of this short pilot has been parallel sound and text annotation. The study has been conducted by Professor [[User:Wim van Dommelen|Wim van Dommelen]] and Professor [[User:Dorothee Beermann|Dorothee Beermann]] at the former Department of Languages and Communication Studies at the [http://www.ntnu.no Norwegian University of Science and Technology]. Scientific assistant for the project was [[User:Asger Hagerup|Asger Hagerup]]. The project has been funded by [http://www.ntnu.no/hf/satsing/sstl the SSTL].  
  
 +
The pilot investigated how to integrate presentations of linguistically annotated audio and text material, combining [http://www.fon.hum.uva.nl/praat/ Praat] and [[Main Page|TypeCraft]].
 +
 +
'''Praat''' is a signal analysis software developed by [http://www.fon.hum.uva.nl/paul/ Paul Boersma] and [http://www.fon.hum.uva.nl/david/ David Weenink] from the [http://www.example.com University of Amsterdam]. It is a tool widely used for the annotation of sound objects. For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object. Using annotated tiers allows easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories. Further funding will allow us to develop an interactive representation of speech data.
 +
 +
On this page and the pages [[Parallel_Annotation_of_Speech_and_Text_-_Part_2|Parallel Annotation of Speech and Text - Part 2]] and [[Parallel_Annotation_of_Speech_and_Text_-_Part_3|Parallel Annotation of Speech and Text - Part 3]] we present some of our data. A sample collection of annotated text can be found be following this link: [http://typecraft.org/TCEditor/965/ Parallel speech text annotation].
 +
 +
The corresponding Praat annotations can be found on this and the following page - we have embedded sound and TextGrid files which can be downloaded for further inspection in Praat. The data presented here allows for example the inspection of '''Cliticalization''' (syntax). '''Vowel Reduction''' (phonology) and '''Voice Onset Time''' (phonetics) in Norwegian. We reflect three Norwegian dialects, a fact which in particular in the context of dialectology might be of some interest. In each case morpho-syntactic and phonetic/phonological annotation are presented in parallel. On the basis of a larger data-set our approach to speech and text annotation will allow a comparison of dialects taking parameters from different fields of linguistics also well as the phonetic annotation into account.
  
 
===Description of the material===  
 
===Description of the material===  
For our study we selected 10 sentences which we selected from phonetic database of the "Sound to Sense" project.  
+
For our study we selected 10 sentences from the phonetic database of the [http://www.sound2sense.eu/ Sound to Sense] project.  
 +
 
 +
To illustrate some of the differences between Norwegian dialects we looked at both segmental and suprasegmental phenomena that divide Norwegian language into a Western and an Eastern dialect group. On the segment level we can examine the pronunciation of the phoneme /r/. As documented in the sound data presented here, the Bergen (Western) speaker pronounces /r/ as a voiced uvular fricative, while the two other speakers (Eastern) pronounce the phoneme as a voiced alveolar tap (although the segment may also appear as an approximant in rapid speech for all three speakers). In addition, the Eastern Norwegian speakers have an assimilation between /r/ and a following alveolar consonant: the consonant sequence surfaces as a retroflex version of the latter consonant. This is not the case for the Bergen speaker, where the two segments are preserved in the surface form.
 +
 
 +
To illustrate a suprasegmental phonomenon we can look at the pitch contour for bisyllabic words with initial stress. For these words there are two possible pitch contours in Norwegian, with either two or three tones. These two pitch contours are commonly called toneme 1 and toneme 2, respectively. In [[Parallel_Annotation_of_Speech_and_Text_-_Part_2|sentence 7]] we look closer at how toneme 1 and 2 are used in inflection, but here we shall briefly look at how toneme 1 is realised in the different dialects.
 +
 
 +
 
 +
 
 +
{| border="1"
 +
|+'''Tone realisation across Norwegian dialects'''
 +
|-
 +
| valign="top"|
 +
[[Image:Peker.jpg|thumb|left|350px|Bergen]]
 +
| valign="bottom"|
 +
[[Image:Døra.jpg|thumb|left|350px| Trondheim]]
 +
|}
 +
 
 +
 
 +
 
 +
[[Image:Vasken.jpg|thumb|left|350px|Eastern Norwegian dialect (south of Trøndelag)]]
 +
 
 +
'''Description of picture material'''
 +
 
 +
The screenshots above and to the left illustrate three words represented using Praat. The data is taken from sentences 3, 7 and 9, respectively. The blue curve in the middle of each screenshot shows the fundamental frequency, or pitch, throughout the pronunciation of the word (it has gaps because unvoiced sounds do not have any pitch). Examining the pitch contour of the words we see that the Bergen speaker pronounces /pe:ker/ with an HL pitch contour, i.e. a high tone on the first syllable and a low tone on the last syllable, while the pattern is the opposite (LH) for the Trondheim pronunciation of /dø:ra/. The last screenshot illustrates the pitch of the speaker of an Eastern Norwegian dialect south of Trøndelag, also with an LH tone contour on /vaska/. Because of the high tone on the stressed syllable, western Norwegian dialects are often referred to as high-tone dialects and contrarily Eastern Norwegian dialects as low-tone dialects. However, there are differences between dialects in the same group as well, comparing the Trondheim speaker and the other Eastern dialect we see that the former has a gradual rise from L to H, while the latter has a more abrupt rise at the end of the word.
 +
 
 +
 
 +
 
  
Sentences 1 to 3
 
  
Speaker dialect: Bergen
+
==Speaker dialect: ''Bergen''==
  
 +
''Sentence 1''
 
<Phrase>10903</Phrase>
 
<Phrase>10903</Phrase>
 
<flashmp3>PSTA01.mp3</flashmp3>
 
<flashmp3>PSTA01.mp3</flashmp3>
  
Download files for viewing in the Praat application:[[File:PSTA01.mp3|Sound]], [[Media:PSTA01.txt| TextGrid]]
+
<br>
 +
<br>
 +
File download for viewing in the Praat ([[#Downloading Help|Downloading Help]]):
  
 +
*[[Media:PSTA01.mp3|Sound]]
 +
*[[Media:PSTA01.txt|TextGrid]]
  
 +
<br>
 +
<br>
 +
''Sentence 2''
 
<Phrase>10904</Phrase>
 
<Phrase>10904</Phrase>
 
<flashmp3>PSTA02.mp3</flashmp3>
 
<flashmp3>PSTA02.mp3</flashmp3>
  
Download files for viewing in the Praat application: [[Media:PSTA02.mp3|Sound]], [[Media:PSTA02.txt|TextGrid]]
+
<br>
 +
<br>
 +
File download for Praat ([[#Downloading Help|Downloading Help]]):
  
 +
* [[Media:PSTA02.mp3|Sound]],
 +
* [[Media:PSTA02.txt|TextGrid]]
  
<Phrase>10905</Phrase>
+
<br>
 
+
<br>
 +
''Sentence 3''
 +
<phrase>10905</phrase>
 
<flashmp3>PSTA03.mp3</flashmp3>
 
<flashmp3>PSTA03.mp3</flashmp3>
  
Download files for viewing in the Praat application:[[Media:PSTA03.mp3|Sound]], [[Media:PSTA03.txt|TextGrid]]
+
<br>
 +
<br>
 +
File download for Praat ([[#Downloading Help|Downloading Help]]):
 +
 
 +
*[[Media:PSTA03.mp3|Sound]],  
 +
*[[Media:PSTA03.txt|TextGrid]]
  
 
==Speaker Dialect: Trondheim==
 
==Speaker Dialect: Trondheim==
  
[[Parallel Processing of Speech and Text Data - Part 2]]
+
[[Parallel Annotation of Speech and Text - Part 2]]
  
==Speaker Dialect:   ==
+
==Speaker Dialect: Eastern Norway==
  
[[Parallel Processing of Speech and Text Data - Part 3]]
+
[[Parallel Annotation of Speech and Text - Part 3]]
  
 
==About the TextGrid files==
 
==About the TextGrid files==
  
The TextGrid files are opened together with the matching sound files for viewing in the Praat application. The TextGrid files consist of three tiers, 'Word' (rendered in Bokmål orthography) 'Phoneme' (shows underlying segments) and 'Note' (shows surface realisation with IPA symbols, and other notes).
+
The TextGrid files are opened together with the matching sound files for viewing in the Praat application. The TextGrid files consist of three tiers, 'Word' (rendered in Bokmål orthography) 'Phoneme' (shows underlying segments) and 'Note' (shows surface realisation with IPA symbols, and other notes). In the phoneme tier, a hash (#) represents a word boundary and a segment inside <angle brackets> is an underlying segment that is syncopated or otherwise missing in the surface form.
  
 
Here is a list of glosses used in the 'Note' tier:
 
Here is a list of glosses used in the 'Note' tier:
Line 62: Line 110:
 
ERR = The speaker errs and corrects himself<br>
 
ERR = The speaker errs and corrects himself<br>
 
HES = (Audible) hesitation from speaker<br>
 
HES = (Audible) hesitation from speaker<br>
 +
 +
The note tier may also show an IPA symbol inside square brackets, this represents the actual realisation of the underlying segment(s).
 +
 +
==Downloading Help==
 +
When clicking on the file links called '''Sound''' and '''TextGrid''' the files will open in a separate window in your browser.
 +
 +
Go to *FILE*, right click and select *Save this Page as*.
 +
 +
You now are able to save the file to a place of your choice in your home directory.
 +
 +
 +
[[Category:Projects]]

Latest revision as of 20:09, 18 February 2016

Project Description

NTNU project 2010

Goal of this short pilot has been parallel sound and text annotation. The study has been conducted by Professor Wim van Dommelen and Professor Dorothee Beermann at the former Department of Languages and Communication Studies at the Norwegian University of Science and Technology. Scientific assistant for the project was Asger Hagerup. The project has been funded by the SSTL.

The pilot investigated how to integrate presentations of linguistically annotated audio and text material, combining Praat and TypeCraft.

Praat is a signal analysis software developed by Paul Boersma and David Weenink from the University of Amsterdam. It is a tool widely used for the annotation of sound objects. For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object. Using annotated tiers allows easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories. Further funding will allow us to develop an interactive representation of speech data.

On this page and the pages Parallel Annotation of Speech and Text - Part 2 and Parallel Annotation of Speech and Text - Part 3 we present some of our data. A sample collection of annotated text can be found be following this link: Parallel speech text annotation.

The corresponding Praat annotations can be found on this and the following page - we have embedded sound and TextGrid files which can be downloaded for further inspection in Praat. The data presented here allows for example the inspection of Cliticalization (syntax). Vowel Reduction (phonology) and Voice Onset Time (phonetics) in Norwegian. We reflect three Norwegian dialects, a fact which in particular in the context of dialectology might be of some interest. In each case morpho-syntactic and phonetic/phonological annotation are presented in parallel. On the basis of a larger data-set our approach to speech and text annotation will allow a comparison of dialects taking parameters from different fields of linguistics also well as the phonetic annotation into account.

Description of the material

For our study we selected 10 sentences from the phonetic database of the Sound to Sense project.

To illustrate some of the differences between Norwegian dialects we looked at both segmental and suprasegmental phenomena that divide Norwegian language into a Western and an Eastern dialect group. On the segment level we can examine the pronunciation of the phoneme /r/. As documented in the sound data presented here, the Bergen (Western) speaker pronounces /r/ as a voiced uvular fricative, while the two other speakers (Eastern) pronounce the phoneme as a voiced alveolar tap (although the segment may also appear as an approximant in rapid speech for all three speakers). In addition, the Eastern Norwegian speakers have an assimilation between /r/ and a following alveolar consonant: the consonant sequence surfaces as a retroflex version of the latter consonant. This is not the case for the Bergen speaker, where the two segments are preserved in the surface form.

To illustrate a suprasegmental phonomenon we can look at the pitch contour for bisyllabic words with initial stress. For these words there are two possible pitch contours in Norwegian, with either two or three tones. These two pitch contours are commonly called toneme 1 and toneme 2, respectively. In sentence 7 we look closer at how toneme 1 and 2 are used in inflection, but here we shall briefly look at how toneme 1 is realised in the different dialects.


Tone realisation across Norwegian dialects
Bergen
Trondheim


Eastern Norwegian dialect (south of Trøndelag)

Description of picture material

The screenshots above and to the left illustrate three words represented using Praat. The data is taken from sentences 3, 7 and 9, respectively. The blue curve in the middle of each screenshot shows the fundamental frequency, or pitch, throughout the pronunciation of the word (it has gaps because unvoiced sounds do not have any pitch). Examining the pitch contour of the words we see that the Bergen speaker pronounces /pe:ker/ with an HL pitch contour, i.e. a high tone on the first syllable and a low tone on the last syllable, while the pattern is the opposite (LH) for the Trondheim pronunciation of /dø:ra/. The last screenshot illustrates the pitch of the speaker of an Eastern Norwegian dialect south of Trøndelag, also with an LH tone contour on /vaska/. Because of the high tone on the stressed syllable, western Norwegian dialects are often referred to as high-tone dialects and contrarily Eastern Norwegian dialects as low-tone dialects. However, there are differences between dialects in the same group as well, comparing the Trondheim speaker and the other Eastern dialect we see that the former has a gradual rise from L to H, while the latter has a more abrupt rise at the end of the word.



Speaker dialect: Bergen

Sentence 1

Jeg ser bildet, kan du si, litt på skrått ned, ovenifra.
“I see the picture, say, somewhat diagonally downwards, from above.”
Jeg
e
1SG
PN
ser
se:r
seePRES
V
bildet
bilde
pictureDEFSG
N
kan
kan:
canPRES
V
du
ʉ
2SG
CL
si
si:
sayINF
V
litt
lit:
a.little
ADVm
po
onDIR
PREP
skrått
skro:t
diagonalADJ>ADV
ADVm
ned
ned
downDIR
ADVm
ovenifra
ovenifra
from.aboveDIRSRC
ADVm


File download for viewing in the Praat (Downloading Help):

Sentence 2
Det dekker omtrent hele det venstre…mest…altså, venstreste kortsiden.
“It covers approximately the whole left…most…that is, the leftest short side.”
Det
de
3SGNEUT
PN
dekker
dek:er
coverPRES
V
omtrent
umtrent
approximately
ADVm
hele
he:le
wholeDEF
ADJ
det
de
DEFSGNEUT
ART
venstre
venstre
left
ADVm
mest
mest
mostSUP
ADJ
altså
aso
that.isDM
ADVm
venstreste
venstreste
leftSUPMUDEF
ADJ
kortsiden
kortsiden
shortsideDEFSG
N


File download for Praat (Downloading Help):

Sentence 3
Hun står med ryggen mot veggen opp og ser på han som skal kaste ballen som står utenfor og peker på boksene.
“She's standing with her back up against the wall and looking at him, who is standing outside and about to throw the ball, and pointing towards the boxes.”
Hun
hun
3SGFEM
PN
står
sto:r
standPRES
V
med
med
withMNR
PREP
ryggen
ryɡ:en
backDEFSG
N
mot
mut
againstDIR
PREP
veggen
veɡ:en
wallDEFSG
N
opp
up
upDIRMU
PREP
og
o
and
CONJC
 
 
ser
se:r
seePRES
V
po
atDIR
PREP
han
han
3SGMASC
PN
som
som
 
PNrel
skal
skal:
shallPRES
V
kaste
kaste
throwINF
V
ballen
bal:en
ballDEFSG
N
som
som
 
PNrel
står
sto:r
standPRES
V
utenfor
ʉtenfor
outside
ADVm
og
o
and
CONJC
peker
pe:ker
pointPRES
V
po
atDIR
PREP
boksene
boksene
boxDEFPL
N


File download for Praat (Downloading Help):

Speaker Dialect: Trondheim

Parallel Annotation of Speech and Text - Part 2

Speaker Dialect: Eastern Norway

Parallel Annotation of Speech and Text - Part 3

About the TextGrid files

The TextGrid files are opened together with the matching sound files for viewing in the Praat application. The TextGrid files consist of three tiers, 'Word' (rendered in Bokmål orthography) 'Phoneme' (shows underlying segments) and 'Note' (shows surface realisation with IPA symbols, and other notes). In the phoneme tier, a hash (#) represents a word boundary and a segment inside <angle brackets> is an underlying segment that is syncopated or otherwise missing in the surface form.

Here is a list of glosses used in the 'Note' tier:

Phonology/Phonetics:
BrV = Segent realised with breathy voice
CrV = Segent realised with creaky voice
DV = Underlying voiced segment realised devoiced
EPN = Epenthesis
RD = Reduction of segment (e.g. corner vowel realised as schwa or plosive as fricative).
V = Underlying non-voiced segment realised voiced

Morphophonology/Syntax
CL = Clitic

Other
ERR = The speaker errs and corrects himself
HES = (Audible) hesitation from speaker

The note tier may also show an IPA symbol inside square brackets, this represents the actual realisation of the underlying segment(s).

Downloading Help

When clicking on the file links called Sound and TextGrid the files will open in a separate window in your browser.

Go to *FILE*, right click and select *Save this Page as*.

You now are able to save the file to a place of your choice in your home directory.