Typecraft v2.5
Jump to: navigation, search

Difference between revisions of "Parallel Annotation of Speech and Text"

(Description of the material)
(Project Description)
Line 2: Line 2:
  
 
== Project Description==
 
== Project Description==
 +
In the following we present the results of parallel sound and text data annotation in the context of a short pilot study. The study was conducted by Professor [[User:Wim van Dommelen]] and Assc.Professor [[User:Dorothee Beermann]] at the [http://www.ntnu.no/isk Institute of Languages and Communication Studies] at the [http://www.ntnu.no Norwegian University of Science and Technology] The project has been funded by [http://www.ntnu.no/hf/satsing/sstl the SSTL].
 +
 +
Goal of the pilot has been to investigate the possibilities for an integrated presentation of linguistically annotated audio and text material, combining [http://www.fon.hum.uva.nl/praat/ Praat) and TypeCraft.
 +
 +
'''Praat''' is a signal analysis software developed by Paul Boersma and David Weenink of the University of Amsterdam. It is a tool widely used for the annotation of sound objects.For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object.Specifying a sentence tier allowed us easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories.
 +
 +
 
===Description of the material===  
 
===Description of the material===  
here general description of the material with reference to "Sound to Sense"
+
For our study we selected 10 sentences which we selected from phonetic database of the "Sound to Sense" project.
  
 
Sentences 1 to 3
 
Sentences 1 to 3

Revision as of 20:59, 18 March 2010

This page is under construction

Project Description

In the following we present the results of parallel sound and text data annotation in the context of a short pilot study. The study was conducted by Professor User:Wim van Dommelen and Assc.Professor User:Dorothee Beermann at the Institute of Languages and Communication Studies at the Norwegian University of Science and Technology The project has been funded by the SSTL.

Goal of the pilot has been to investigate the possibilities for an integrated presentation of linguistically annotated audio and text material, combining [http://www.fon.hum.uva.nl/praat/ Praat) and TypeCraft.

Praat is a signal analysis software developed by Paul Boersma and David Weenink of the University of Amsterdam. It is a tool widely used for the annotation of sound objects.For the present study we have taken advantage of the fact that Praat annotation data resides in a TextGrid object that exists separately from the sound object.Specifying a sentence tier allowed us easy referencing of data across applications. At present our sound signal representations are static, and selective, that is, they focus on the presentation of one selected feature to illustrate interesting correlations across phonetic and linguistic categories.


Description of the material

For our study we selected 10 sentences which we selected from phonetic database of the "Sound to Sense" project.

Sentences 1 to 3

Speaker dialect: Bergen

Jeg ser bildet, kan du si, litt på skrått ned, ovenifra.
“I see the picture, say, somewhat diagonally downwards, from above.”
Jeg
e
1SG
PN
ser
se:r
seePRES
V
bildet
bilde
pictureDEFSG
N
kan
kan:
canPRES
V
du
ʉ
2SG
CL
si
si:
sayINF
V
litt
lit:
a.little
ADVm
po
onDIR
PREP
skrått
skro:t
diagonalADJ>ADV
ADVm
ned
ned
downDIR
ADVm
ovenifra
ovenifra
from.aboveDIRSRC
ADVm
Download files for viewing in the Praat application:File:PSTA01.mp3, TextGrid
Det dekker omtrent hele det venstre…mest…altså, venstreste kortsiden.
“It covers approximately the whole left…most…that is, the leftest short side.”
Det
de
3SGNEUT
PN
dekker
dek:er
coverPRES
V
omtrent
umtrent
approximately
ADVm
hele
he:le
wholeDEF
ADJ
det
de
DEFSGNEUT
ART
venstre
venstre
left
ADVm
mest
mest
mostSUP
ADJ
altså
aso
that.isDM
ADVm
venstreste
venstreste
leftSUPMUDEF
ADJ
kortsiden
kortsiden
shortsideDEFSG
N
Download files for viewing in the Praat application: Sound, TextGrid
Hun står med ryggen mot veggen opp og ser på han som skal kaste ballen som står utenfor og peker på boksene.
“She's standing with her back up against the wall and looking at him, who is standing outside and about to throw the ball, and pointing towards the boxes.”
Hun
hun
3SGFEM
PN
står
sto:r
standPRES
V
med
med
withMNR
PREP
ryggen
ryɡ:en
backDEFSG
N
mot
mut
againstDIR
PREP
veggen
veɡ:en
wallDEFSG
N
opp
up
upDIRMU
PREP
og
o
and
CONJC
 
 
ser
se:r
seePRES
V
po
atDIR
PREP
han
han
3SGMASC
PN
som
som
 
PNrel
skal
skal:
shallPRES
V
kaste
kaste
throwINF
V
ballen
bal:en
ballDEFSG
N
som
som
 
PNrel
står
sto:r
standPRES
V
utenfor
ʉtenfor
outside
ADVm
og
o
and
CONJC
peker
pe:ker
pointPRES
V
po
atDIR
PREP
boksene
boksene
boxDEFPL
N


Download files for viewing in the Praat application:Sound, TextGrid

Speaker Dialect: Trondheim

Parallel Processing of Speech and Text Data - Part 2

Speaker Dialect:

Parallel Processing of Speech and Text Data - Part 3

About the TextGrid files

The TextGrid files are opened together with the matching sound files for viewing in the Praat application. The TextGrid files consist of three tiers, 'Word' (rendered in Bokmål orthography) 'Phoneme' (shows underlying segments) and 'Note' (shows surface realisation with IPA symbols, and other notes).

Here is a list of glosses used in the 'Note' tier:

Phonology/Phonetics:
BrV = Segent realised with breathy voice
CrV = Segent realised with creaky voice
DV = Underlying voiced segment realised devoiced
EPN = Epenthesis
RD = Reduction of segment (e.g. corner vowel realised as schwa or plosive as fricative).
V = Underlying non-voiced segment realised voiced

Morphophonology/Syntax
CL = Clitic

Other
ERR = The speaker errs and corrects himself
HES = (Audible) hesitation from speaker