Experiences from the Spoken Dutch Corpus Project
Citations
196 citations
Cites background from "Experiences from the Spoken Dutch C..."
...…Dutch tagged with the Spoken Dutch Corpus tagset (Van Eynde 2004): The approximately ninemillion word of the transcribed Spoken Dutch Corpus itself (Oostdijk et al. 2002), the ILK corpus with approximately 46 thousand part-of-speech tagged words, the D-Coi corpus with approximately 330 thousand…...
[...]
152 citations
Cites background or methods from "Experiences from the Spoken Dutch C..."
...The names belonged to the 2200 most frequent lemmas in the CGN (Oostdijk et al., 2002)....
[...]
...For Part 1, 9 words were selected from each frequency band of 1000 words between words ranked 1 to 10,000 according to the Corpus of Spoken Dutch (CGN; Oostdijk et al., 2002)....
[...]
70 citations
67 citations
Cites methods from "Experiences from the Spoken Dutch C..."
...The systems used in this study were trained on the read speech parts of the Spoken Dutch Corpus (CGN) (Oostdijk et al. 2002) and the CoGeN corpus (Demuynck et al. 1997)....
[...]
63 citations
Cites background or methods from "Experiences from the Spoken Dutch C..."
...The CGN corpus (Oostdijk et al., 2002) does not contain sufficient speech data to train acoustic models for these sounds....
[...]
...…phone models used for all alignments presented here were 37 32-Gaussian tristate monophone acoustic models (Hämäläinen, Gubian, ten Bosch, & Boves, 2009) that had been trained on 396,187 word tokens of the Dutch Library of the Blind of the Spoken Dutch Corpus (CGN, Oostdijk et al., 2002)....
[...]
...We transformed the transcriptions to the standards developed in the CGN project (Oostdijk et al., 2002)....
[...]
...The acoustic phone models used for all alignments presented here were 37 32-Gaussian tristate monophone acoustic models (Hämäläinen, Gubian, ten Bosch, & Boves, 2009) that had been trained on 396,187 word tokens of the Dutch Library of the Blind of the Spoken Dutch Corpus (CGN, Oostdijk et al., 2002)....
[...]
...It was compiled by merging lexical resources such as CELEX (Baayen, Piepenbrock, & Gulikers, 1995), RBN (van der Vliet, 2007) and CGN (Oostdijk et al., 2002)....
[...]
References
34,965 citations
861 citations
"Experiences from the Spoken Dutch C..." refers methods in this paper
...The design of the headers has been inspired by the guidelines of the Text Encoding Initiative (SperbergMcQueen and Burnard, 1994) and the Corpus Encoding Standard (Ide, 1996)....
[...]
394 citations
367 citations
"Experiences from the Spoken Dutch C..." refers methods in this paper
...For the partof-speech distinction we employ the classical classification into ten parts of speech, which is also used in the standard reference grammar for Dutch Algemene Nederlandse Spraakkunst (ANS; Haeseryn et al., 1997)....
[...]
244 citations