Showing papers by "Paavo Alku published in 2010"

PDF

Open Access

Journal Article•DOI•

Atypical perceptual narrowing in prematurely born infants is associated with compromised language acquisition at 2 years of age

[...]

Eira Jansson-Verkasalo¹, Eira Jansson-Verkasalo², Timo Ruusuvirta³, Timo Ruusuvirta⁴, Minna Huotilainen⁵, Minna Huotilainen³, Paavo Alku⁶, Elena Kushnerenko, Kalervo Suominen², Seppo Rytky², Mirja Luotonen², Tuula Kaukola¹, Uolevi Tolonen², Mikko Hallman¹ - Show less +10 more•Institutions (6)

University of Oulu¹, Oulu University Hospital², University of Helsinki³, University of Turku⁴, University of Jyväskylä⁵, Aalto University⁶

30 Jul 2010-BMC Neuroscience

TL;DR: The data suggest that detrimental effects of prematurity on language skills are based on the low degree of specialization to native language early in development, and delayed or atypical perceptual narrowing was associated with slower language acquisition.

...read moreread less

Abstract: Early auditory experiences are a prerequisite for speech and language acquisition. In healthy children, phoneme discrimination abilities improve for native and degrade for unfamiliar, socially irrelevant phoneme contrasts between 6 and 12 months of age as the brain tunes itself to, and specializes in the native spoken language. This process is known as perceptual narrowing, and has been found to predict normal native language acquisition. Prematurely born infants are known to be at an elevated risk for later language problems, but it remains unclear whether these problems relate to early perceptual narrowing. To address this question, we investigated early neurophysiological phoneme discrimination abilities and later language skills in prematurely born infants and in healthy, full-term infants. Our follow-up study shows for the first time that perceptual narrowing for non-native phoneme contrasts found in the healthy controls at 12 months was not observed in very prematurely born infants. An electric mismatch response of the brain indicated that whereas full-term infants gradually lost their ability to discriminate non-native phonemes from 6 to 12 months of age, prematurely born infants kept on this ability. Language performance tested at the age of 2 years showed a significant delay in the prematurely born group. Moreover, those infants who did not become specialized in native phonemes at the age of one year, performed worse in the communicative language test (MacArthur Communicative Development Inventories) at the age of two years. Thus, decline in sensitivity to non-native phonemes served as a predictor for further language development. Our data suggest that detrimental effects of prematurity on language skills are based on the low degree of specialization to native language early in development. Moreover, delayed or atypical perceptual narrowing was associated with slower language acquisition. The results hence suggest that language problems related to prematurity may partially originate already from this early tuning stage of language acquisition.

...read moreread less

93 citations

Journal Article•DOI•

Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification

[...]

Rahim Saeidi¹, Jouni Pohjalainen², Tomi Kinnunen¹, Paavo Alku²•Institutions (2)

University of Eastern Finland¹, Aalto University²

19 Apr 2010-IEEE Signal Processing Letters

TL;DR: Two temporally weighted variants of linear predictive modeling are introduced to speaker verification and they are compared to FFT, which is normally used in computing MFCCs, and to conventional linear prediction and the effect of speech enhancement (spectral subtraction) on the system performance with each of the four feature representations is investigated.

...read moreread less

Abstract: Text-independent speaker verification under additive noise corruption is considered. In the popular mel-frequency cepstral coefficient (MFCC) front-end, the conventional Fourier-based spectrum estimation is substituted with weighted linear predictive methods, which have earlier shown success in noise-robust speech recognition. Two temporally weighted variants of linear predictive modeling are introduced to speaker verification and they are compared to FFT, which is normally used in computing MFCCs, and to conventional linear prediction. The effect of speech enhancement (spectral subtraction) on the system performance with each of the four feature representations is also investigated. Experiments by the authors on the NIST 2002 SRE corpus indicate that the accuracy of the conventional and proposed features are close to each other on clean data. For factory noise at 0 dB SNR level, baseline FFT and the better of the proposed features give EERs of 17.4% and 15.6%, respectively. These accuracies improve to 11.6% and 11.2%, respectively, when spectral subtraction is included as a preprocessing method. The new features hold a promise for noise-robust speaker verification.

...read moreread less

69 citations

Journal Article•DOI•

Perception of emotional valences and activity levels from vowel segments of continuous speech.

[...]

Teija Waaramaa¹, Anne-Maria Laukkanen¹, Matti Airas², Paavo Alku²•Institutions (2)

University of Tampere¹, Helsinki University of Technology²

01 Jan 2010-Journal of Voice

TL;DR: It appeared to be possible to identify valences from vowel samples of short duration ( approximately 150 milliseconds), and NAQ tended to differentiate between the valences and activity levels perceived in both genders.

...read moreread less

47 citations

Journal Article•DOI•

Change and novelty detection in speech and non-speech sound streams.

[...]

Alexander Sorokin¹, Paavo Alku², Teija Kujala¹•Institutions (2)

University of Helsinki¹, Aalto University²

23 Apr 2010-Brain Research

TL;DR: The current MMN results imply enhanced processing of linguistically relevant information at the pre-attentive stage and in this way support the domain-specific model of speech perception.

...read moreread less

46 citations

The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010

[...]

Antti Suni, Tuomo Raitio, Martti Vainio, Paavo Alku

01 Jan 2010

TL;DR: The GlottHMM as mentioned in this paper is a hidden Markov model (HMM) based speech synthesis system that utilizes glottal inverse filtering for separating the vocal tract from the source.

...read moreread less

Abstract: This paper describes the GlottHMM speech synthesis entry for Blizzard Challenge 2010. GlottHMM is a hidden Markov model (HMM) based speech synthesis system that utilizes glottal inverse filtering for separating the vocal tract from the glottal source. The source and the filter characteristics are modeled separately in the framework of HMM. In the synthesis stage, natural glottal flow pulses are used to generate the excitation signal, and the excitation signal is further modified according to the desired voice source characteristics generated by the HMM. In order to prevent the over-smoothing of the vocal tract filter parameters, a new formant enhancement method is used to make the vocal tract resonances sharper. Finally, speech is synthesized by filtering the glottal excitation by the vocal tract filter. Index Terms: speech synthesis, hidden Markov model, glottal inverse filtering

...read moreread less

36 citations

Journal Article•DOI•

Asymmetrical representation of auditory space in human cortex

[...]

Nelli H. Salminen¹, Hannu Tiitinen¹, Hannu Tiitinen², Ismo Miettinen¹, Ismo Miettinen², Paavo Alku¹, Patrick J. C. May², Patrick J. C. May¹ - Show less +4 more•Institutions (2)

Helsinki University of Technology¹, Helsinki University Central Hospital²

08 Jan 2010-Brain Research

TL;DR: The results of an MEG study utilizing realistic spatial sound stimuli presented in a stimulus-specific adaptation paradigm support a population rate code model where neurons in the right hemisphere are more often tuned to the left than to the right of the perceiver while in the left hemisphere these two neuronal populations are of equal size.

...read moreread less

34 citations

Journal Article•DOI•

Sensitivity of the human auditory cortex to acoustic degradation of speech and non-speech sounds

[...]

Ismo Miettinen¹, Ismo Miettinen², Hannu Tiitinen², Hannu Tiitinen¹, Paavo Alku¹, Patrick J. C. May¹, Patrick J. C. May² - Show less +3 more•Institutions (2)

Aalto University¹, Helsinki University Central Hospital²

22 Feb 2010-BMC Neuroscience

TL;DR: It is proposed that the increased activity of AEFs reflects cortical processing of acoustic properties common to both speech and non-speech stimuli, and is most likely caused by spectral changes brought about by the decrease of amplitude resolution.

...read moreread less

Abstract: Recent studies have shown that the human right-hemispheric auditory cortex is particularly sensitive to reduction in sound quality, with an increase in distortion resulting in an amplification of the auditory N1m response measured in the magnetoencephalography (MEG). Here, we examined whether this sensitivity is specific to the processing of acoustic properties of speech or whether it can be observed also in the processing of sounds with a simple spectral structure. We degraded speech stimuli (vowel /a/), complex non-speech stimuli (a composite of five sinusoidals), and sinusoidal tones by decreasing the amplitude resolution of the signal waveform. The amplitude resolution was impoverished by reducing the number of bits to represent the signal samples. Auditory evoked magnetic fields (AEFs) were measured in the left and right hemisphere of sixteen healthy subjects. We found that the AEF amplitudes increased significantly with stimulus distortion for all stimulus types, which indicates that the right-hemispheric N1m sensitivity is not related exclusively to degradation of acoustic properties of speech. In addition, the P1m and P2m responses were amplified with increasing distortion similarly in both hemispheres. The AEF latencies were not systematically affected by the distortion. We propose that the increased activity of AEFs reflects cortical processing of acoustic properties common to both speech and non-speech stimuli. More specifically, the enhancement is most likely caused by spectral changes brought about by the decrease of amplitude resolution, in particular the introduction of periodic, signal-dependent distortion to the original sound. Converging evidence suggests that the observed AEF amplification could reflect cortical sensitivity to periodic sounds.

...read moreread less

30 citations

Comparison of formant enhancement methods for HMM-based speech synthesis.

[...]

Tuomo Raitio¹, Antti Suni, Hannu Pulakka, Martti Vainio, Paavo Alku - Show less +1 more•Institutions (1)

Aalto University¹

01 Jan 2010

TL;DR: Experiments indicate that the formants enhancement prior to HMM training improves the quality of synthetic speech by providing sharper formants, and the performance of the new formant enhancement method is similar to the existing method.

...read moreread less

Abstract: Hidden Markov model (HMM) based speech synthesis has a tendency to over-smooth the spectral envelope of speech, which makes the speech sound muffled. One means to compensate for the over-smoothing is to enhance the formants of the spectral model. This paper compares the performance of different formant enhancement methods, and studies the enhancement of the formants prior to HMM training in order to preemptively compensate for the over-smoothing. A new method for enhancing the formants of an all-pole model is also introduced. Experiments indicate that the formant enhancement prior to HMM training improves the quality of synthetic speech by providing sharper formants, and the performance of the new formant enhancement method is similar to the existing method.

...read moreread less

27 citations

Proceedings Article•

Extended Weighted Linear Prediction (XLP) Analysis of Speech and its Application to Speaker Verification in Adverse Conditions

[...]

Jouni Pohjalainen¹, Rahim Saeidi², Tomi Kinnunen², Paavo Alku³•Institutions (3)

Helsinki University of Technology¹, University of Eastern Finland², Aalto University³

01 Jan 2010

TL;DR: A generalized formulation of linear prediction (LP), including both conventional and temporally weighted LP analysis methods as special cases, is introduced, shown to lead to performance improvement in several cases involving channel distortion and additive noise mismatch between the training and recognition conditions.

...read moreread less

Abstract: This paper introduces a generalized formulation of linear prediction (LP), including both conventional and temporally weighted LP analysis methods as special cases. The temporally weighted methods have recently been successfully applied to noise robust spectrum analysis in speech and speaker recognition applications. In comparison to those earlier methods, the new generalized approach allows more versatility in weighting different parts of the data in the LP analysis. Two such weighted methods are evaluated and compared to the conventional spectrum modeling methods FFT and LP, as well as the temporally weighted methods WLP and SWLP, by substituting each of them in turn as the spectrum estimation method of the MFCC feature extraction stage of a GMM-UBM based speaker verification system. The new methods are shown to lead to performance improvement in several cases involving channel distortion and additive noise mismatch between the training and recognition conditions.

...read moreread less

26 citations

Journal Article•DOI•

Discrimination of native and non-native vowel contrasts in bilingual Turkish-German and monolingual German children : insight from the Mismatch Negativity ERP component

[...]

Tanja Rinker¹, Paavo Alku², Sibylle Brosch³, Markus Kiefer³•Institutions (3)

University of Konstanz¹, Helsinki University of Technology², University of Ulm³

01 May 2010-Brain and Language

TL;DR: The results suggest that the Turkish-German children have not yet fully acquired the German phonetic inventory despite living in Germany since birth and being immersed in a German-speaking environment.

...read moreread less

26 citations

Journal Article•DOI•

The effects of healthy aging on auditory processing in humans as indexed by transient brain responses.

[...]

Laura E. Matilainen¹, Laura E. Matilainen², Sanna S. Talvitie², Sanna S. Talvitie¹, Eero Pekkonen¹, Paavo Alku², Patrick J. C. May¹, Patrick J. C. May², Hannu Tiitinen², Hannu Tiitinen¹ - Show less +6 more•Institutions (2)

Helsinki University Central Hospital¹, Aalto University²

01 Jun 2010-Clinical Neurophysiology

TL;DR: The latency of the transient brain response was prolonged in the aged compared to the young and the accuracy of behavioral responses to sinusoids was diminished among the aged.

...read moreread less

Proceedings Article•DOI•

Laryngeal voice quality in the expression of focus.

[...]

Martti Vainio¹, Matti Airas², Juhani Järvikivi³, Paavo Alku⁴•Institutions (4)

University of Helsinki¹, Helsinki University of Technology², Max Planck Society³, Aalto University⁴

26 Sep 2010

TL;DR: The results supported the hypothesis – formed by an earlier study of voice quality changes in running speech – that more prominent syllables are produced with a less tense voice quality and less prominent ones with a more tense phonation.

...read moreread less

Abstract: Prominence relations in speech are signaled by various ways including such phonetic means as voice fundamental frequency, intensity, and duration. A less studied acoustic feature affecting prominence is the so called voice quality which is determined by changes in the airflow caused by different laryngeal settings. We investigated the changes in voice quality with respect to linguistic prosodic signaling of focus in simple three word utterances. We used inverse filtering based methods for calculating and parametrizing the glottal flow in several different vowels and focus conditions. The results supported our hypothesis – formed by an earlier study of voice quality changes in running speech – that more prominent syllables are produced with a less tense voice quality and less prominent ones with a more tense phonation. We provide both physiological and linguistic explanations for the phenomena.

...read moreread less

Proceedings Article•DOI•

Bandwidth extension of telephone speech using a filter bank implementation for highband MEL spectrum

[...]

Hannu Pulakka¹, Ville Myllylau², Laura Laaksonen², Paavo Alku¹•Institutions (2)

Aalto University¹, Nokia²

01 Aug 2010

TL;DR: A new method for the bandwidth extension of telephone speech using only the information in the narrowband speech to improve speech quality compared with a previously published bandwidth extension method.

...read moreread less

Abstract: The limited audio bandwidth used in telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4–8 kHz using only the information in the narrowband speech. First, a wideband excitation is generated by spectral folding from the narrowband linear prediction residual. The highband of this signal is divided into four subbands with a filter bank, and a neural network is used to weight the subbands based on features calculated from the narrowband speech. Bandwidth-extended speech is obtained by summing the weighted subbands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with a previously published bandwidth extension method.

...read moreread less

Journal Article•DOI•

Temporal integration of vowel periodicity in the auditory cortex

[...]

Santeri Yrttiaho¹, Hannu Tiitinen, Paavo Alku¹, Ismo Miettinen, Patrick J. C. May - Show less +1 more•Institutions (1)

Aalto University¹

16 Jul 2010-Journal of the Acoustical Society of America

TL;DR: A temporal window of integration for the periodicity of speech sounds in the F0 range of typical male speech is defined, which is 3-5 cycles, or 30-50 ms, and this window is shorter for the periodic than for the aperiodic stimuli.

...read moreread less

Abstract: Cortical sensitivity to the periodicity of speech sounds has been evidenced by larger, more anterior responses to periodic than to aperiodic vowels in several non-invasive studies of the human brain. The current study investigated the temporal integration underlying the cortical sensitivity to speech periodicity by studying the increase in periodicity-specific cortical activation with growing stimulus duration. Periodicity-specific activation was estimated from magnetoencephalography as the differences between the N1m responses elicited by periodic and aperiodic vowel stimuli. The duration of the vowel stimuli with a fundamental frequency (F0=106 Hz) representative of typical male speech was varied in units corresponding to the vowel fundamental period (9.4 ms) and ranged from one to ten units. Cortical sensitivity to speech periodicity, as reflected by larger and more anterior responses to periodic than to aperiodic stimuli, was observed when stimulus duration was 3 cycles or more. Further, for stimulus durations of 5 cycles and above, response latency was shorter for the periodic than for the aperiodic stimuli. Together the current results define a temporal window of integration for the periodicity of speech sounds in the F0 range of typical male speech. The length of this window is 3-5 cycles, or 30-50 ms.

...read moreread less

Journal Article•DOI•

The effects of cortical ischemic stroke on auditory processing in humans as indexed by transient brain responses

[...]

Sanna S. Talvitie¹, Laura E. Matilainen², Laura E. Matilainen¹, Eero Pekkonen², Paavo Alku¹, Patrick J. C. May¹, Patrick J. C. May², Hannu Tiitinen¹, Hannu Tiitinen² - Show less +5 more•Institutions (2)

Aalto University¹, Helsinki University Central Hospital²

01 Jun 2010-Clinical Neurophysiology

TL;DR: Directly observable, non-invasive brain measures can be used in assessing the effects of stroke which are related to the behavioral symptoms patients manifest, and left-hemispheric ischemic stroke impairs the processing of sinusoidal and speech sounds.

...read moreread less

Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise

[...]

Rahim Saeidi¹, Jouni Pohjalainen¹, Tomi Kinnunen², Paavo Alku²•Institutions (2)

University of Eastern Finland¹, Helsinki University of Technology²

01 Jan 2010

TL;DR: Two temporally weighted variants of linear predictive (LP) modeling are introduced to speaker verification and compared to FFT, which is normally used in computing MFCCs, and to conventional LP and the effect of speech enhancement (spectral subtraction) on the system performance with each of the four feature representations.

...read moreread less

Abstract: We consider text-independent speaker verification under ad ditive noise corruption. In the popular mel-frequency cepstral coefficient (MFCC) front-end, we substitute the convention al Fourier-based spectrum estimation with weighted linear predictive methods, which have earlier shown success in noise-robust speech recognition. We introduce two temporally weighted variants of linear predictive (LP) modeling to speaker verification and compare them to FFT, which is normally used in computing MFCCs, and to conventional LP. We also investigate the effect of speech enhancement (spectral subtraction) on the system performance with each of the four feature representations. Our experiments on the NIST 2002 SRE corpus indicate that the accuracy of the conventional and proposed features are close to each other on clean data. On 0 dB SNR level, baseline FFT and the better of the proposed features give EERs of 17.4 % and 15.6 %, respectively. These accuracies improve to 11.6 % and 11.2 %, respectively, when spectral subtraction is included as a pre-processing method. The new features hold a promise for noise-robust speaker verification.

...read moreread less