scispace - formally typeset
Open AccessBook ChapterDOI

Cogito componentiter ergo sum

Reads0
Chats0
TLDR
Evidence that independent component analysis of abstract data such as text, social interactions, music, and speech leads to low level cognitive components is presented.
Abstract
Cognitive component analysis (COCA) is defined as the process of unsupervised grouping of data such that the ensuing group structure is well-aligned with that resulting from human cognitive activity. We present evidence that independent component analysis of abstract data such as text, social interactions, music, and speech leads to low level cognitive components.

read more

Content maybe subject to copyright    Report

Cogito componentiter ergo sum
Lars Kai Hansen and Ling Feng
Informatics and Mathematical Modelling,
Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
lkh,lf@imm.dtu.dk, www.imm.dtu.dk
Abstract. Cognitive component analysis (COCA) is defined as the pro-
cess of unsupervised grouping of data such that the ensuing group struc-
ture is well-aligned with that resulting from human cognitive activity.
We present evidence that independent component analysis of abstract
data such as text, social interactions, music, and speech leads to low
level cognitive components.
1 Introduction
During evolution human and animal visual, auditory, and other primary sensory
systems have adapted to a broad ecological ensemble of natural stimuli. This
long-time on-going adaption process has resulted in representations in human
and animal perceptual systems which closely resemble the information theo-
retically optimal representations obtained by independent component analysis
(ICA), see e.g., [1] on visual contrast representation, [2] on visual features in-
volved in color and stereo processing, and [3] on representations of sound fea-
tures. For a general discussion consult also the textbook [4]. The human per-
ceptional system can model complex multi-agent scenery. Human cognition uses
a broad spectrum of cues for analyzing perceptual input and separate individ-
ual signal producing agents, such as speakers, gestures, affections etc. Humans
seem to be able to readily adapt strategies from one perceptual domain to an-
other and furthermore to apply these information processing strategies, such as,
object grouping, to both more abstract and more complex environments, than
have been present during evolution. Given our present, and rather detailed, un-
derstanding of the ICA-like representations in primary sensory systems, it seems
natural to pose the question: Are such information optimal representations rooted
in independence also relevant for modeling higher cognitive functions? We are
currently pursuing a research programme, trying to understand the limitations
of the ecological hypothesis for higher level cognitive processes, such as grouping
abstract objects, navigating social networks, understanding multi-speaker envi-
ronments, and understanding the representational differences between self and
environment.
Wagensberg has pointed to the importance of independence for successful
‘life forms’ [5]
A living individual is part of the world with some identity that tends to
become independent of the uncertainty of the rest of the world

Thus natural selection favors innovations that increase independence of the agent
in the face of environmental uncertainty, while maximizing the gain from the
predictable aspects of the niche. This view represents a precision of the classical
Darwinian formulation that natural selection simply favors adaptation to given
conditions. Wagensberg points out that recent biological innovations, such as ner-
vous systems and brains are means to decrease the sensitivity to un-predictable
fluctuations. An important aspect of environmental analysis is to be able to rec-
ognize event induced by the self and other agents. Wagensberg also points out
that by creating alliances agents can give up independence for the benefit of
a group, which in turns may increase independence for the group as an entity.
Both in its simple one-agent form and in the more tentative analysis of the group
model, Wagensberg’s theory emphasizes the crucial importance of statistical in-
dependence for evolution of perception, semantics and indeed cognition. While
cognition may be hard to quantify, its direct consequence, human behavior, has a
rich phenomenology which is becoming increasingly accessible to modeling. The
digitalization of everyday life as reflected, say, in telecommunication, commerce,
and media usage allows quantification and modeling of human patterns of activ-
ity, often at the level of individuals. Grouping of events or objects in categories is
−100 −50 0 50 100 150
−60
−40
−20
0
20
40
60
x
1
x
2
−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
LATENT COMPONENT 4
LATENT COMPONENT 2
Fig. 1. Generic feature distribution produced by a linear mixture of sparse sources
(left) and a typical ‘latent semantic analysis’ scatter plot of principal component pro-
jections of a text database (right). The characteristics of a sparse signal is that it
consists of relatively few large magnitude samples on a background of small signals.
Latent semantic analysis of the so-called MED text database reveals that the semantic
comp onents are indeed very sparse and does follow the laten directions (principal com-
p onents). Topics are indicated by the different markers. In [6] an ICA analysis of this
data set post-processed with simple heuristic classifier showed that manually defined
topics were very well aligned with the independent components. Hence, constituting
an example of cognitive component analysis: Unsupervised learning leads to a label
structure corresponding to that of human cognitive activity.
fundamental to human cognition. In machine learning, classification is a rather

well-understoo d task when based on labelled examples [7]. In this case classifica-
tion belongs to the class of supervised learning problems. Clustering is a closely
related unsupervised learning problem, in which we use general statistical rules
to group objects, without a priori providing a set of labelled examples. It is a
fascinating finding in many real world data sets that the label structure discov-
ered by unsupervised learning closely coincides with labels obtained by letting a
human or a group of humans perform classification, labels derived from human
cognition. We thus define cognitive component analysis (COCA) as unsupervised
grouping of data such that the ensuing group structure is well-aligned with that
resulting from human cognitive activity [8]. This presentation is based on our
earlier results using ICA for abstract data such as text, dynamic text (chat),
web pages including text and images, see e.g., [9–13].
2 Where have we found cognitive components?
Text analysis. Symbol manipulation as in text is a hallmark of human cog-
nition. Salton proposed the so-called vector space representation for statistical
modeling of text data, for a review see [14]. A term set is chosen and a doc-
ument is represented by the vector of term frequencies. A document database
then forms a so-called term-document matrix. The vector space representation
can be used for classification and retrieval by noting that similar documents
are somehow expected to be ‘close’ in the vector space. A metric can be based
on the simple Euclidean distance if document vectors are properly normalized,
otherwise angular distance may be useful. This approach is principled, fast, and
language independent. Deerwester and co-workers developed the concept of la-
tent semantics based on principal component analysis of the term-document
matrix [15]. The fundamental observation behind the latent semantic indexing
(LSI) approach is that similar documents are using similar vocabularies, hence,
the vectors of a given topic could appear as produced by a stochastic process
with highly correlated term-entries. By projecting the term-frequency vectors on
a relatively low dimensional subspace, say determined by the maximal amount
of variance one would be able to filter out the inevitable ‘noise’. Noise should
here be thought of as individual document differences in term usage within a
specific context. For well-defined topics, one could simply hope that a given
context would have a stable core term set that would come out as a eigen ‘di-
rection’ in the term vector space. The orthogonality constraint of co-variance
matrix eigenvectors, however, often limits the interpretability of the LSI rep-
resentation, and LSI is therefore more often used as a dimensional reduction
tool. The representation can be post-processed to reveal cognitive components,
e.g., by interactive visualization schemes [16]. In Figure 1 (right) we indicate
the scatter plot of a small text database. The database consists of documents
with overlapping vocabulary but five different (high level cognitive) labels. The
‘ray’-structure signaling a sparse linear mixture is evident.

Social networks. The ability to understand social networks is critical to hu-
mans. Is it possible that the simple unsupervised scheme for identification of
independent components could play a role in this human capacity? To investi-
gate this issue we have initiated an analysis of a well-known social network of
some practical importance. The so-called actor network is a quantitative rep-
−0.1 −0.05 0 0.05 0.1
−0.15
−0.1
−0.05
0
0.05
0.1
EIGENCAST 3
EIGENCAST 5
Fig. 2. The so-called actor network quantifies the collaborative pattern of 382.000
actors participating in almost 128.000 movies. For visualization we have projected
the data onto principal components (LSI) of the actor-actor co-variance matrix. The
eigenvectors of this matrix are called ‘eigencasts’ and they represent characteristic
communities of actors that tend to co-appear in movies. The network is extremely
sparse, so the most prominent variance components are related to near-disjunct sub-
communities of actors with many common movies. However, a close up of the coupling
b etween two latent semantic components (the region (0, 0)) reveals the ubiquitous
signature of a sparse linear mixture: A pronounced ‘ray’ structure emanating from
(0,0). The ICA components are color coded. We speculate that the cognitive machinery
developed for handling of independent events can also be used to locate independent
sub-communities, hence, navigate complex social networks.
resentation of the co-participation of actors in movies, for a discussion of this
network, see e.g., [17]. The observation model for the network is not too different
from that of text. Each movie is represented by the cast, i.e., the list of actors.
We have converted the table of the about T = 128.000 movies with a total
of J = 382.000 individual actors, to a sparse J × T matrix. For visualization
we have projected the data onto principal components (LSI) of the actor-actor
co-variance matrix. The eigenvectors of this matrix are called ‘eigencasts’ and
represent characteristic communities of actors that tend to co-appear in movies.
The sparsity and magnitude of the network means that the components are dom-
inated by communities with very small intersections, however, a closer look at
such scatter plots reveals detail suggesting that a simple linear mixture model in-
deed provides a reasonable representation of the (small) coupling between these
relative trivial disjunct subsets, see Figure 2. Such insight may be used for com-

puter assisted navigation of collaborative, peer-to-peer networks, for example in
the context of search and retrieval.
Musical genre. The growing market for digital music and intelligent music
services creates an increasing interest in modeling of music data. It is now feasible
to estimate consensus musical genre by supervised learning from rather short
music segments, say 5-10 seconds, see e.g., [18], thus enabling computerized
handling of music request at a high cognitive complexity level. To understand
the possibilities and limitations for unsupervised modeling of music data we here
visualize a small music sample using the latent semantic analysis framework.
The intended use is for a music search engine function, hence, we envision that
−1 0 1
−5
0
5
−1 0 1
−5
0
5
−1 0 1
−2
0
2
−1 0 1
−2
0
2
−1 0 1
−0.2
0
0.2
0.4
−5 0 5
−5
0
5
−5 0 5
−2
0
2
−5 0 5
−2
0
2
−1 0 1
−0.1
0
0.1
0.2
−5 0 5
−2
0
2
4
−5 0 5
−2
0
2
−5 0 5
−2
0
2
−0.5 0 0.5 1
−0.2
0
0.2
−0.5 0 0.5 1
0
1
2
−0.6−0.4−0.2 0 0.2
−1.5
−1
−0.5
0
0.5
−2 0 2
−2
0
2
−1 −0.5 0 0.5
−0.2
0
0.2
0.4
−1 −0.5 0 0.5
0
1
2
−0.5 0 0.5 1
−2
0
2
−0.2 0 0.20.40.6
−0.4
−0.2
0
0.2
0.4
PC 1
PC 2
PC 3
PC 4
PC 5
Fig. 3. We represent three music tunes (genre labels: heavy metal, jazz, classical)
by their spectral content in overlapping small time frames (w = 30msec, with an overlap
of 10msec, see [18], for details). To make the visualization relatively independent of
‘pitch’, we use the so-called mel-cepstral representation (MFCC, K = 13 coefficients
pr. frame). To reduce noise in the visualization we have ‘sparsified’ the amplitudes. This
was achieved simply by keeping coefficients that belonged to the upper 5% magnitude
p ercentile. The total number of frames in the analysis was F = 10
5
. Latent semantic
analysis provided unsupervised subspaces with maximal variance for a given dimension.
We show the scatter plots of the data of the first 1-5 latent dimensions. The scatter
plots below the diagonal have been ‘zoomed’ to reveal more details of the ICA ‘ray’
structure. For interpretation we have coded the data points with signatures of the three
genres involved: classical (), heavy metal (diamond), jazz (+). The ICA ray structure
is striking, however, note that the situation is not one-to-one (ray to genre) as in the
small text databases. A component (ray) quantifies a characteristic musical ‘theme’
at the temporal level of a frame (30msec), i.e., an entity similar to the ‘phoneme’ in
sp eech.
a largely text based query has resulted in a few music entries, and the algorithm
is going to find the group structure inherent in the retrieval for the user. We

Figures
Citations
More filters
Book ChapterDOI

Semantic Contours in Tracks Based on Emotional Tags

TL;DR: It is proposed that it might be feasible to automatically generate affective user preferences based on song lyrics by applying LSA latent semantic analysis to bottom-up represent the correlation of terms and song lyrics in a vector space that reflects the emotional context.

On Phonemes As Cognitive Components of Speech

Ling Feng, +1 more
TL;DR: The hypothesis is ecological: features that essentially independent in a context defined ensemble can be efficiently coded using a sparse independent component representation and it is found that supervised and unsupervised learning seem to identify similar representations.

Cognitive Components of Speech at Different Time Scales

TL;DR: The independent cognitive component hypothesis is proposed and tested, it is hypothesized that features that are essen- tially independent in a reasonable ensemble can be efficiently coded using a sparse independent component representation, and efficient representa- tions of high level processes are based on sparse distributed codes and approximate independence, similar to what has been found for more basic perceptual processes.

Cognitive Component Analysis

TL;DR: These evidences confirmed that ICA is relevant for representing semantic structure, in text and social networks and musical features; more strikingly for representing information embedded in speech signals, such as phoneme, gender, speaker identity, and even height.
Proceedings ArticleDOI

Attention: A machine learning perspective

TL;DR: A statistical machine learning model of top-down task driven attention based on the notion of `gist' which shows the performance of the classifier equipped with the attention mechanism is almost as good as one that has access to all low-level features and clearly improving over a simple `random attention' alternative.
References
More filters
Reference BookDOI

Multimedia Image and Video Processing

TL;DR: Multimedia Image and Video Processing offers not only state-of-the-art research and developments, but does so in a way that provides a solid introduction to each topic and builds a basis for future study, research, and development.
Posted Content

Signal Detection using ICA: Application to Chat Room Topic Spotting

TL;DR: In this paper, the authors used Independent Component Analysis (ICA) to detect meaningful context structures in a chat room log file and applied this approach to the understanding of chat and showed that ICA can detect meaningful contextual structures in chat room logs.
Proceedings ArticleDOI

Independent component analysis for understanding multimedia content

TL;DR: It is demonstrated that ICA of combined text and image features has a synergistic effect, i.e., the retrieval classification rates increase if based on multimedia components relative to single media analysis.

On Independent Component Analysis for Multimedia Signals

TL;DR: The source separation problem can be formulated as a likelihood formulation, and the likelihood approach allows for direct adaptation of the plethora of powerful schemes for parameter optimization, regularization, and evaluation of supervised learning algorithms.
Proceedings ArticleDOI

Decision time horizon for music genre classification using short time features

TL;DR: Music genre classification has been explored with special emphasis on the decision time horizon and ranking of tapped-delay-line short-time features and a Gaussian classifier with full covariance structure and a linear neural network (NN) classifier are used.
Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Cogito componentiter ergo sum" ?

The authors present evidence that independent component analysis of abstract data such as text, social interactions, music, and speech leads to low level cognitive components. 

A metric can be based on the simple Euclidean distance if document vectors are properly normalized, otherwise angular distance may be useful. 

By projecting the term-frequency vectors on a relatively low dimensional subspace, say determined by the maximal amount of variance one would be able to filter out the inevitable ‘noise’. 

The vector space representation can be used for classification and retrieval by noting that similar documents are somehow expected to be ‘close’ in the vector space. 

The authors are currently pursuing a research programme, trying to understand the limitations of the ecological hypothesis for higher level cognitive processes, such as grouping abstract objects, navigating social networks, understanding multi-speaker environments, and understanding the representational differences between self and environment. 

To make the visualization relatively independent of ‘pitch’, the authors use the so-called mel-cepstral representation (MFCC, K = 13 coefficients pr. frame). 

The fundamental observation behind the latent semantic indexing (LSI) approach is that similar documents are using similar vocabularies, hence, the vectors of a given topic could appear as produced by a stochastic process with highly correlated term-entries. 

The eigenvectors of this matrix are called ‘eigencasts’ and represent characteristic communities of actors that tend to co-appear in movies. 

The authors thus define cognitive component analysis (COCA) as unsupervised grouping of data such that the ensuing group structure is well-aligned with that resulting from human cognitive activity [8]. 

It is a fascinating finding in many real world data sets that the label structure discovered by unsupervised learning closely coincides with labels obtained by letting a human or a group of humans perform classification, labels derived from human cognition. 

It is now feasible to estimate consensus musical genre by supervised learning from rather short music segments, say 5-10 seconds, see e.g., [18], thus enabling computerized handling of music request at a high cognitive complexity level.