scispace - formally typeset
Search or ask a question

Showing papers by "Teuvo Kohonen published in 2011"


Journal ArticleDOI
TL;DR: A corpus-based algorithm is applied to derive semantic representations of words based on analyses of contextual information extracted from a text corpus—specifically, analyses of word co-occurrences in a large-scale electronic database of text.
Abstract: In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus— specifically, analyses of word co-occurrences in a large- scale electronic database of text. Here, a target word is represented as the combination of the average of all words preceding the target and all words following it in a text corpus. The semantic representation of the target words can be further processed by a self-organizing map (SOM; Kohonen, Self-organizing maps, 2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving fea- tures, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has its applications in a variety of contexts, including computational modeling of language acquisition and processing. In this report, we present specific examples from two languages (English and Chinese) to demonstrate how the method is applied to extract the semantic representations of words.

20 citations


Book ChapterDOI
13 Jun 2011
TL;DR: Differing from previous approaches, in which individual words were mapped onto the SOM, in this work histograms of various word classes or otherwise defined subsets of words were formed on the SOM array and it was found that the words are not only clustered according to the word classes, but joint or overlapping clusters of words from different classes can also be formed according toThe role of the words as sentence constituents.
Abstract: Contextual SOMs of Chinese words have been constructed in this work. Differing from previous approaches, in which individual words were mapped onto the SOM, in this work histograms of various word classes or otherwise defined subsets of words were formed on the SOM array. It was found that the words are not only clustered according to the word classes, but joint or overlapping clusters of words from different classes can also be formed according to the role of the words as sentence constituents. A further new effect was found. When the histograms were formed using test words restricted to certain intervals of word frequencies, the histograms were found to depend on the frequency, and the corresponding partial clusters were often very compact.

16 citations


Teuvo Kohonen1
01 Jan 2011
TL;DR: A new version of the MDS, called the nearest-neighbors multidimensional scaling (NN-MDS), is introduced, which represents the local data structures more accurately and converges fast, two amendments had to be added, in order to describe the global structures as well.
Abstract: Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Teuvo Kohonen Name of the publication New Developments of Nonlinear Projections for the Visualization of Structures in Nonvectorial Data Sets Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series SCIENCE + TECHNOLOGY 8/2011 Field of research Computer science Abstract New nonlinear projections for the visualization of structures in nonvectorial data sets are suggested. Since there exist problems with the convergence of the traditional multidimensional scaling (MDS) when the data are nonvectorial, a new version of the MDS, called the nearest-neighbors multidimensional scaling (NN-MDS), is introduced. While it represents the local data structures more accurately and converges fast, two amendments had to be added, in order to describe the global structures as well. A new initialization method called the GENINIT is also introduced. It is very fast and may be used as a nonlinear projection, too, but it is more suitable for the initialization of the more accurate learning algorithms.New nonlinear projections for the visualization of structures in nonvectorial data sets are suggested. Since there exist problems with the convergence of the traditional multidimensional scaling (MDS) when the data are nonvectorial, a new version of the MDS, called the nearest-neighbors multidimensional scaling (NN-MDS), is introduced. While it represents the local data structures more accurately and converges fast, two amendments had to be added, in order to describe the global structures as well. A new initialization method called the GENINIT is also introduced. It is very fast and may be used as a nonlinear projection, too, but it is more suitable for the initialization of the more accurate learning algorithms.

1 citations