scispace - formally typeset
Search or ask a question
Author

Ville Pulkki

Bio: Ville Pulkki is an academic researcher from Aalto University. The author has contributed to research in topics: Audio signal & Loudspeaker. The author has an hindex of 32, co-authored 247 publications receiving 4923 citations. Previous affiliations of Ville Pulkki include Technical University of Denmark & Helsinki University of Technology.


Papers
More filters
Journal Article
TL;DR: In this paper, a vector-based reformulation of amplitude panning is derived, which leads to simple and computationally efficient equations for virtual sound source positioning, and it is possible to create two- or three-dimensional sound fields where any number of loudspeakers can be placed arbitrarily.
Abstract: A vector-based reformulation of amplitude panning is derived, which leads to simple and computationally efficient equations for virtual sound source positioning. Using the method, vector base amplitude panning (VBAP), it is possible to create two- or three-dimensional sound fields where any number of loudspeakers can be placed arbitrarily. The method produces virtual sound sources that are as sharp as is possible with current loudspeaker configuration and amplitude panning methods. A digital tool that implements two- and three-dimensional VBAP with eight inputs and outputs has been realized.

933 citations

Journal Article
TL;DR: Directional audio coding (DirAC) as discussed by the authors is a method for spatial sound representation, applicable for different sound reproduction systems in the analysis part the diffuseness and direction of arrival of sound are estimated in a single location depending on time and frequency.
Abstract: Directional audio coding (DirAC) is a method for spatial sound representation, applicable for different sound reproduction systems In the analysis part the diffuseness and direction of arrival of sound are estimated in a single location depending on time and frequency In the synthesis part microphone signals are first divided into nondiffuse and diffuse parts, and are then reproduced using different strategies DirAC is developed from an existing technology for impulse response reproduction, spatial impulse response rendering (SIRR), and implementations of DirAC for different applications are described

408 citations

03 Aug 2001
TL;DR: The vector base amplitude panning (VBAP) method, which is a reformulation of the existing pair-wise panning method to position virtual sources in arbitrary 2-D or 3-D loudspeaker setups, is introduced.
Abstract: Spatial audio aims to recreate or synthesize spatial attributes when reproducing audio over loudspeakers or headphones. Such spatial attributes include, for example, locations of perceived sound sources and an auditory sense of space. This thesis focuses on new methods of spatial audio for loudspeaker listening and on measuring the quality of spatial audio by subjective and objective tests. In this thesis the vector base amplitude panning (VBAP) method, which is an amplitude panning method to position virtual sources in arbitrary 2-D or 3-D loudspeaker setups, is introduced. In amplitude panning the same sound signal is applied to a number of loudspeakers with appropriate non-zero amplitudes. With 2-D setups VBAP is a reformulation of the existing pair-wise panning method. However, differing from earlier solutions it can be generalized for 3-D loudspeaker setups as a triplet-wise panning method. A sound signal is then applied to one, two, or three loudspeakers simultaneously. VBAP has certain advantages compared to earlier virtual source positioning methods in arbitrary layouts. Previous methods either used all loudspeakers to produce virtual sources, which results in some artefacts, or they used loudspeaker triplets with a non-generalizable 2-D user interface. The virtual sources generated with VBAP are investigated. The human directional hearing is simulated with a binaural auditory model adapted from the literature. The interaural time difference (ITD) cue and the interaural level difference (ILD) cue which are the main localization cues are simulated for amplitude-panned virtual sources and for real sources. Psychoacoustic listening tests are conducted to study the subjective quality of virtual sources. Statistically significant phenomena found in listening test data are explained by auditory model simulation results. To obtain a generic view of directional quality in arbitrary loudspeaker setups, directional cues are simulated for virtual sources with loudspeaker pairs and triplets in various setups. The directional qualities of virtual sources generated with VBAP can be stated as follows. Directional coordinates used for this purpose are the angle between a position vector and the median plane ( cc), and the angle between a projection of a position vector to the median plane and frontal direction ( cc). The perceived cc direction of a virtual source coincides well with the VBAP panning direction when a loudspeaker set is near the median plane. When the loudspeaker set is moved towards a side of a listener, the perceived cc direction is biased towards the median plane. The perceived cc direction of an amplitude-panned virtual source is individual and cannot be predicted with any panning law.

179 citations

Journal Article
TL;DR: Spatial impulse response rendering (SIRR) analyzes the time-dependent direction of arrival and diffuseness of measured room responses within frequency bands to synthesize a multichannel response suitable for reproduction with any chosen surround loudspeaker setup.
Abstract: Spatial impulse response rendering (SIRR) is a recent technique for the reproduction of room acoustics with a multichannel loudspeaker system. SIRR analyzes the time-dependent direction of arrival and diffuseness of measured room responses within frequency bands. Based on the analysis data, a multichannel response suitable for reproduction with any chosen surround loudspeaker setup is synthesized. When loaded to a convolving reverberator, the synthesized responses create a very natural perception of space corresponding to the measured room. A technical description of the analysis-synthesis method is provided. Results of formal subjective evaluation and further analysis of SIRR are presented in a companion paper to be published in JAES in 2006 Jan./Feb.

166 citations

01 Jan 1995
TL;DR: In the experiments reported in this work the source data consisted of the raw text of Grimm fairy tales without any prior syntactic or semantic categorization of the words, and the algorithm was able to create diagrams that seem to comply reasonably well with the traditional syntactical categorizations and human intuition about the semantics.
Abstract: Semantic roles of words in natural languages are reeected by the contexts in which they occur. These roles can explicitly be visualized by the Self-Organizing Map (SOM). In the experiments reported in this work the source data consisted of the raw text of Grimm fairy tales without any prior syntactic or semantic categorization of the words. The algorithm was able to create diagrams that seem to comply reasonably well with the traditional syntactical categorizations and human intuition about the semantics of the words. It has earlier been shown that the Self-Organizing Map (SOM) can be applied to the visual-ization of contextual roles of words, i.e., similarities in their usage in short contexts formed of adjacent words 4]. This paper demonstrates that such relations or roles are also statistically reeected in unrestricted, even quaint natural expressions. The source material chosen for this experiment consisted of 200 Grimm tales (English translation). In most practical applications of the SOM, the input to the map algorithm is derived from some measurements, usually after their preprocessing. In such cases, the input vectors are supposed to have metric relations. Interpretation of languages, on the contrary, must be based on the processing of sequences of discrete symbols. If the words were encoded numerically, the ordered sets formed of them could also be compared mutually as well as with reference expressions. However, as no numerical value of the code should imply any order to the words themselves, it will be necessary to use uncorrelated vectors for encoding. The simplest method to introduce uncorrelated codes is to assign a unit vector for each word. When all diierent words in the input material are listed, a code vector can be deened to have as many components as there are words in the list. This method, however, is only practicable in very small experiments. If the vocabulary is large as in the present experiments, we may then encode the words by quasi-orthogonal random vectors of a much smaller dimensionality 4]. To create a map of discrete symbols that occur within the sentences, each symbol must be presented in the due context. The context may consist of the immediate surroundings of the word in the text. Application of the self-organizing maps to natural language processing has been described earlier in, e.g., 2], 3], 4], 5], and 6].

131 citations


Cited by
More filters
Proceedings Article
11 Jul 2010
TL;DR: This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Abstract: If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/

2,243 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Journal ArticleDOI
TL;DR: These results support a neurobiological model of language in the Hebbian tradition and provide evidence for processing differences between words and matched meaningless pseudowords, and between word classes, such as concrete content and abstract function words, and words evoking visual or motor associations.
Abstract: If the cortex is an associative memory, strongly connected cell assemblies will form when neurons in different cortical areas are frequently active at the same time. The cortical distributions of these assemblies must be a consequence of where in the cortex correlated neuronal activity occurred during learning. An assembly can be considered a functional unit exhibiting activity states such as full activation ("ignition") after appropriate sensory stimulation (possibly related to perception) and continuous reverberation of excitation within the assembly (a putative memory process). This has implications for cortical topographies and activity dynamics of cell assemblies forming during language acquisition, in particular for those representing words. Cortical topographies of assemblies should be related to aspects of the meaning of the words they represent, and physiological signs of cell assembly ignition should be followed by possible indicators of reverberation. The following postulates are discussed in detail: (1) assemblies representing phonological word forms are strongly lateralized and distributed over perisylvian cortices; (2) assemblies representing highly abstract words such as grammatical function words are also strongly lateralized and restricted to these perisylvian regions; (3) assemblies representing concrete content words include additional neurons in both hemispheres; (4) assemblies representing words referring to visual stimuli include neurons in visual cortices; and (5) assemblies representing words referring to actions include neurons in motor cortices. Two main sources of evidence are used to evaluate these proposals: (a) imaging studies focusing on localizing word processing in the brain, based on stimulus-triggered event-related potentials (ERPs), positron emission tomography (PET), and functional magnetic resonance imaging (fMRI), and (b) studies of the temporal dynamics of fast activity changes in the brain, as revealed by high-frequency responses recorded in the electroencephalogram (EEG) and magnetoencephalogram (MEG). These data provide evidence for processing differences between words and matched meaningless pseudowords, and between word classes, such as concrete content and abstract function words, and words evoking visual or motor associations. There is evidence for early word class-specific spreading of neuronal activity and for equally specific high-frequency responses occurring later. These results support a neurobiological model of language in the Hebbian tradition. Competing large-scale neuronal theories of language are discussed in light of the data summarized. Neurobiological perspectives on the problem of serial order of words in syntactic strings are considered in closing.

1,009 citations

Journal Article
TL;DR: Alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.
Abstract: The yield of alkenols and cycloalkenols is substantially improved by carrying out the reaction of olefins with formaldehyde in the presence of selected catalysts. In accordance with one embodiment, alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.

851 citations