scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A vision architecture for unconstrained and incremental learning of multiple categories

TL;DR: This work presents an integrated vision architecture capable of incrementally learning several visual categories based on natural hand-held objects and imposes no restrictions on the viewing angle of presented objects, relaxing the common constraint on canonical views.
Abstract: We present an integrated vision architecture capable of incrementally learning several visual categories based on natural hand-held objects. Additionally we focus on interactive learning, which requires real-time image processing methods and a fast learning algorithm. The overall system is composed of a figure-ground segregation part, several feature extraction methods and a life-long learning approach combining incremental learning with category-specific feature selection. In contrast to most visual categorization approaches, where typically each view is assigned to a single category, we allow labeling with an arbitrary number of shape and color categories. We also impose no restrictions on the viewing angle of presented objects, relaxing the common constraint on canonical views.
Figures (8)

Content maybe subject to copyright    Report

A Vision Architecture for Unconstrained
and Incremental Learning of Multiple
Categories
Stephan Kirstein
1,2
, Alexander Denecke
1,3
, Stephan Hasler
1
,
Heiko Wersing
1
, Horst-Michael Gross
2
and Edgar orner
1
1
Honda Research Institute Europe GmbH
Carl-Legien-Str. 30 63073 Offenbach am Main, Germany
{stephan.kirstein, stephan.hasler, heiko.wersing, edgar.koerner}@honda-ri.de
2
Ilmenau University of Technology
Neuroinformatics and Cognitive Robotics Lab
P.O.B. 10 05 65, 98684 Ilmenau, Germany
horst-michael.gross@tu-ilmenau.de
3
Bielefeld University
CoR-Lab
P.O.B. 10 01 31, 33501 Bielefeld, Germany
adenecke@cor-lab.uni-bielefeld.de
Abstract
We present an integrated vision architecture capable of incrementally
learning several visual categories based on natural hand-held objects. Ad-
ditionally we focus on interactive learning, which requires real-time image
pro cessing methods and a fast learning algorithm. The overall system is
compos ed of a figure-ground segregation part, several feature extraction
methods and a life-long learning approach combining incremental learn-
ing with category-specific feature selection. In contrast to most visual
categorization approaches, where typically each view is assigned to a sin-
gle category, we allow labeling with an arbitrary number of shape and
color categories. We also impose no restrictions on the viewing angle of
presented objects, relaxing the common constraint on canonical views.

1 Introduction
An amazing capability of the human visual system is the ability to learn an
enormous repertoire of visual categories. This large amount of categories is ac-
quired incrementally during our life and requires at least partially the direct
interaction with a tutor. Inspired by the child-like knowledge acquisition we
propose an architecture for learning several visual categories in an incremen-
tal and interactive fashion. The architecture is composed of several building
blocks including figure-ground segregation, feature extraction, a category learn-
ing module and user interaction. All these modules together allow the training
of categories based on natural objects presented in hand.
The learning system proposed in this paper is partly based on earlier work
dealing with online object identification in cluttered scenes (Wersing et al.,
2007). For our learning system a novel incremental category learning method is
proposed that combines a learning vector quantization (LVQ) (Kohonen 1989)
network to approach the “stability-plasticity dilemma” with a category-specific
forward feature selection. Based on this combination we are able to interactively
learn a category-specific long-term memory (LTM) representation, where previ-
ous LTM models proposed by Kirstein, Wersing, & orner (2008) could only be
learned offline. Other major contributions are the integration of an enhanced
figure-ground segregation method and the extraction of parts-based feature. In
the following further related work with respect to categorization frameworks,
online learning methods and life-long learning architectures is discussed in more
detail.
1.1 Visual Categorization Architectures
In the past few years many architectures dealing with object detection and
categorization tasks have been proposed in the computer vision community. In-
terestingly many of those approaches are based on local parts-based features,
which are extracted around some defined interest points e.g. (Leibe et al., 2004;
Willamowski et al., 2004; Agarwal et al., 2004) or on agglomerative clustering
(Mikolajczyk, Leibe, & Schiele 2006) to build up object models for categories
like faces or cars. The advantages of these approaches are their robustness
against partial occlusion, scale changes, and the ability to deal with cluttered
environments. One drawback is that such methods are typically restricted to
the canonical view of a certain category. Thomas et al. (2006) try to overcome
this limitation by training several pose-specific implicit shape models (ISM)
(Leibe, Leonardis, & Schiele 2004) for each category. Afterwards detected
parts from neighboring pose-dependent ISMs are linked by so-called “activa-
tion links”. This allows the detection of categories from many viewpoints. Such
categorization architectures, however, are designed for offline usage only, where
the required training time is not important. This makes them unsuitable for
our desired online and interactive training. A recent work of Fritz, Kruijff, &
Schiele (2007) addresses this issue and proposes a semi-supervised and incre-
mental clustering method for interactive category learning. This approach is,
1

however, restricted to the canonical view of the categories.
1.2 Online and Interactive Learning Systems
The development of online and interactive learning systems became more and
more popular in the recent years, see e.g. (Roth et al., 2006), (Steels & Kaplan,
2001), (Arsenio, 2004) or (Wersing et al., 2007). All these systems are able
to identify several objects in cluttered environments, but are not applicable to
categorization tasks. This is because their learning methods can not extract a
more variable category representation. Nonetheless those models are useful as
a short-term memory (STM) representation. Afterwards this representation is
consolidated into a more abstract LTM representation of categories allowing a
higher generalization performance compared to the object-specific STM repre-
sentation. Of particular interest with respect to online and interactive learning
of categories is the work of Skoˇcaj et al. (2007). It enables learning of several
simple color and shape categories by selecting a single feature which describes
the particular category most consistently. The category itself is then repre-
sented by the mean and variance of this selected feature (Skoˇcaj et al., 2007)
or more recently by an incremental kernel density estimation using mixtures of
Gaussians (Skoˇcaj et al., 2008). Especially this feature selection enhances the
categorization performance, but the restriction to a single feature allows only the
representation of simple categories with little appearance changes. Therefore we
propose a feature selection process that can incrementally select an arbitrary
number of features, if they are required for the representation of a particular
category.
1.3 Life-Long Learning Architectures
Based on the STM representation, which is assumed to be limited in capacity,
we propose an incremental and life-long learning method to acquire a category-
specific long-term memory (LTM) representation. For the LTM we approach
the so-called “stability-plasticity dilemma”. This dilemma occurs when neu-
ral networks are trained with a limited and changing training ensemble, causing
the well known “catastrophic forgetting effect” (French 1999). A common strat-
egy for life-long learning architectures e.g. (Hamker, 2001; Furao & Hasegawa,
2006; Kirstein et al., 2008) is the usage of a node specific learning rate com-
bined with an incremental node insertion rule. This permits plasticity of newly
inserted neurons, while the stability of matured neurons is pr eserved. The ma-
jor drawback of those architectures commonly used for identification tasks is
the inefficient separation of cooccuring categories. This means for natural ob-
jects, which typically belong to several different categories (e.g. red-white car),
a decoupled representation for each category (for category red, white and car)
should be learned. This decoupling leads to a more condensed representation
and higher generalization performance compared to object identification archi-
tectures. Another approach to the “stability-plasticity dilemma” was proposed
by Ozawa et al. (2005). Here representative input-output pairs are stored into
2

0.0
0.0
0.3
0.8
0.0
0.0
0.1
0.6
0.0
0.2
User
Interaction
...
Input Image
Color Histogram
Holistic C2 Features
...
...
ForegroundSegment
Depth Map
Parts−Based Features
...
Category Learning
Incremental
Category
Labels
Feature Vector
Figure 1: Category Learning System. Based on an object hypothesis ex-
tracted from the depth map a figure-ground segregation is performed. The
detected foreground is used to extract color and shape features. Color fea-
tures are represented as histogram bins in the RGB color space. In contrast to
most other categorization approaches we combine general category independent
features obtained from a detection hierarchy with parts-based features. All ex-
tracted features are concatenated into a single structureless vector. This vector
together with the category labels provided by an human tutor, is the input to
the incremental category learning module.
a long-term memory for stabilizing an incremental learning radial basis func-
tion (RBF) like network. Additionally it also accounts for a feature selection
mechanism based on incremental principal component analysis, but no class-
specific feature selection is applied. Therefore this method it unsuitable for
categorization tasks without modification.
In the following we describe step by step the building blocks of our learning
system illustrated in Fig. 1. The first pr ocessing block extracts the object
hypothesis from cluttered s cenes. This hypothesis is further refined by a figure-
ground segregation method as described in Section 2. Additionally we describe
all used feature extraction methods in Section 3. The extracted shape and
color information is combined and used to train the proposed life-long learning
vector quantization approach described in Section 4, which is trained in direct
interaction with a human tutor. The target of our system is interactive and life-
long learning of categories. Therefore in Section 5 the learning results of our
proposed methods are shown for differently complex databases. Additionally we
show the interactive learning capability of the proposed learning system under
real-world constraints. Finally we discuss the results and related work in Section
6.
3

2 Preprocessing and Figure-ground Segregation
One of the essential problems when dealing with learning in unconstrained en-
vironments is the definition of a shared attention concept between the learning
system and the human tutor. Specifically this is necessary to decide what and
when to learn. In our architecture we use the peri-personal space concept (Go-
erick et al., 2006), which basically is defined as the manipulation range around
an active vision system. Everything in this short distance range is of particular
interest to the system with respect to interaction and learning. Therefore we
use a stereo camera system with a pan-tilt unit and parallelly aligned cameras,
which deliver a stream of image pairs. Depth information is calculated after
the correction of lens distortions. This depth information is used to generate
an interaction hypothesis in cluttered scenes, which after its initial detection is
actively tracked until it disappears from the peri-personal attention range. Ad-
ditionally we apply a color constancy method (Pomierski & Gross 1996) and a
size normalization of the hypothesis. Both operations ensure invariances, which
are beneficial for any kind of recognition system, but are essential for fast on-
line and interactive learning in unconstrained environments. Finally a region of
interest (ROI) of an object view is extracted and scaled to a fixed segment size
of 144x144 pixel.
The extracted segment j
i
contains the object view, but also a substantial
amount of background clutter as can be seen in Fig. 2. For the incremental
build-up of category representations it is beneficial to suppress such clutter, oth-
erwise it would slow down the learning process and considerably more training
examples are necessary. Therefore we apply an additional figure-ground segre-
gation as proposed by Denecke et al. (2009) to reduce this influence. The basic
idea of this segregation method illustrated in Fig. 2 is to train for each segment
j
i
a learning vector quantization (LVQ) network based on a predefined number of
distinct prototypes for foreground and background. As an initial hypothesis for
the foreground the noisy depth information belonging to the extracted segment
is used. The noise of this hypothesis is caused by the ill-posed problem of dispar-
ity calculation and is basically located at the corner of the corresponding object
view. Furthermore also “holes” at textureless object parts are common. Due to
the fact that the objects are presented by hand, skin color parts in the segment
are systematic noise, which we remove from the initial foreground hypothesis
based on the detection method proposed by Fritsch et al. (2002). Due to this
skin color removal faces and gestures can not be learned with this preprocess-
ing. Nevertheless with a modified preprocessing as proposed in Wersing et al.
(2007) a combined learning of objects and faces can be achieved. The learning of
each LVQ prototype is based on feature maps consisting of RGB-color features
as well as the pixel positions. Instead of the standard Euclidean metrics for
the distance computation an extended version of the generalized matrix LVQ
(Schneider, Biehl, & Hammer 2007) approach is used. This metric adaptation is
used to learn relevance factors for each prototype and feature dimension. These
local relevance factors are adapted online and weight dynamically the differ-
ent feature maps to discriminate between foreground and background. For the
4

Citations
More filters
Journal ArticleDOI
TL;DR: To achieve the life-long learning ability for a cognitive system, a new learning vector quantization approach combined with a category-specific feature selection method to allow several metrical "views" on the representation space of each individual vector quantification node.

53 citations


Cites methods from "A vision architecture for unconstra..."

  • ...Additionally Θmin is the node-dependent learning rate as proposed by Kirstein et al. (2008):...

    [...]

  • ...Finally the long-term stability of these incrementally learned representation nodes is considered as proposed by Kirstein et al. (2008). Additionally for our learning approach a category-specific forward feature selection method is used to enable the separation of co-occurring categories, because it defines category-specific metrical “views” on the nodes of the exemplar-based network....

    [...]

  • ...Furthermore we recently could show that our proposed cLVQ learning method can be integrated into a larger vision system that allows online learning of categories based on hand-held and complex-shaped objects under full rotation (Kirstein et al., 2008, 2009)....

    [...]

Journal ArticleDOI
TL;DR: An architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development are presented, which are term self-extension and include representations of gaps and uncertainty for specific kinds of knowledge.
Abstract: There are many different approaches to building a system that can engage in autonomous mental development. In this paper, we present an approach based on what we term self-understanding, by which we mean the explicit representation of and reasoning about what a system does and does not know, and how that knowledge changes under action. We present an architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development, which we term self-extension. The contributions include: representations of gaps and uncertainty for specific kinds of knowledge, and a goal management and planning system for setting and achieving learning goals.

41 citations


Cites background from "A vision architecture for unconstra..."

  • ...Different systems focus on different aspects of the problem, such as the system architecture and integration [68], [69], [71], learning [66], [67], [71], or social interaction [70]....

    [...]

Proceedings ArticleDOI
05 Dec 2011
TL;DR: Representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and the implemented robot system are presented and demonstrated.
Abstract: In this paper we present representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and show the implemented robot system. We present how beliefs about the world are created by processing visual and linguistic information and show how they are used for planning system behaviour with the aim at satisfying its internal drive - to extend its knowledge. The system facilitates different kinds of learning initiated by the human tutor or by the system itself. We demonstrate these principles in the case of learning about object colours and basic shapes.

41 citations


Cites background from "A vision architecture for unconstra..."

  • ...Different systems focus on different aspects of this problem, such as the system architecture and integration [3], [4], [6], learning [1], [2], [6], [7], or social interaction [5]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a metric learning scheme which allows for an autonomous learning of parameters (such as the underlying scoring matrix in sequence alignments) according to a given discriminative task in relational LVQ, and offers an increased interpretability of the results by pointing out structural invariances for the given task.

35 citations


Cites background from "A vision architecture for unconstra..."

  • ...Because of the intuitive definition of models in terms of prototypical representatives, prototype-based methods like LVQ enjoy a wide popularity in application domains, particularly if human inspection and interaction are necessary, or life-long model adaptation is considered [28, 20, 18]....

    [...]

Journal ArticleDOI
TL;DR: A collection of mechanisms that enable integration of heterogeneous competencies in a principled way are described that are capable of engaging in different kinds of learning interactions, e.g. those initiated by a tutor or by the system itself.
Abstract: This article presents an integrated robot system capable of interactive learning in dialogue with a human. Such a system needs to have several competencies and must be able to process different typ...

20 citations


Cites background from "A vision architecture for unconstra..."

  • ...Different systems focus on different aspects of this problem, such as the system architecture and integration (Bauckhage et al., 2001; Billard & Hayes, 1999; Briggs & Scheutz, 2012; Bolder et al., 2008; Hawes et al., 2010; Karaoguz, Rodemann, Wrede, & Goerick, 2012; Kirstein et al., 2009; Lallee et al., 2012; Lutkebohle et al., 2009; Mason & Lopes, 2011; Sun, 2007); learning and symbol grounding (Salvi, Montesano, Bernardino, & Santos-Victor, 2012; Roy & Pentland, 2002; Billard & Hayes, 1999; Steels & Kaplan, 2000; Kirstein et al....

    [...]

  • ...…and symbol grounding (Salvi, Montesano, Bernardino, & Santos-Victor, 2012; Roy & Pentland, 2002; Billard & Hayes, 1999; Steels & Kaplan, 2000; Kirstein et al., 2009; de Greeff, Delaunay, & Belpaeme, 2009; Chernova & Veloso, 2009; Belpaeme & Morse, 2012; Briggs & Scheutz, 2012; Tellex,…...

    [...]

  • ..., 2009; Mason & Lopes, 2011; Sun, 2007); learning and symbol grounding (Salvi, Montesano, Bernardino, & Santos-Victor, 2012; Roy & Pentland, 2002; Billard & Hayes, 1999; Steels & Kaplan, 2000; Kirstein et al., 2009; de Greeff, Delaunay, & Belpaeme, 2009; Chernova & Veloso, 2009; Belpaeme & Morse, 2012; Briggs & Scheutz, 2012; Tellex, Thaker, Joseph, & Roy, 2014; Perera & Allen, 2013; Schiebener, Morimoto, Asfour, & Ude, 2013; Deits et al., 2013); motivation (Lutkebohle et al....

    [...]

  • ...…et al., 2001; Billard & Hayes, 1999; Briggs & Scheutz, 2012; Bolder et al., 2008; Hawes et al., 2010; Karaoguz, Rodemann, Wrede, & Goerick, 2012; Kirstein et al., 2009; Lallee et al., 2012; Lutkebohle et al., 2009; Mason & Lopes, 2011; Sun, 2007); learning and symbol grounding (Salvi,…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

14,509 citations


"A vision architecture for unconstra..." refers methods in this paper

  • ...For this learning method we propose a combination of an incremental exemplar-based network and a forward feature selection method (see (Guyon & Elissee 2003) for an introduction to feature selection methods)....

    [...]

Book
01 Jan 1984
TL;DR: The purpose and nature of Biological Memory, as well as some of the aspects of Memory Aspects, are explained.
Abstract: 1. Various Aspects of Memory.- 1.1 On the Purpose and Nature of Biological Memory.- 1.1.1 Some Fundamental Concepts.- 1.1.2 The Classical Laws of Association.- 1.1.3 On Different Levels of Modelling.- 1.2 Questions Concerning the Fundamental Mechanisms of Memory.- 1.2.1 Where Do the Signals Relating to Memory Act Upon?.- 1.2.2 What Kind of Encoding is Used for Neural Signals?.- 1.2.3 What are the Variable Memory Elements?.- 1.2.4 How are Neural Signals Addressed in Memory?.- 1.3 Elementary Operations Implemented by Associative Memory.- 1.3.1 Associative Recall.- 1.3.2 Production of Sequences from the Associative Memory.- 1.3.3 On the Meaning of Background and Context.- 1.4 More Abstract Aspects of Memory.- 1.4.1 The Problem of Infinite-State Memory.- 1.4.2 Invariant Representations.- 1.4.3 Symbolic Representations.- 1.4.4 Virtual Images.- 1.4.5 The Logic of Stored Knowledge.- 2. Pattern Mathematics.- 2.1 Mathematical Notations and Methods.- 2.1.1 Vector Space Concepts.- 2.1.2 Matrix Notations.- 2.1.3 Further Properties of Matrices.- 2.1.4 Matrix Equations.- 2.1.5 Projection Operators.- 2.1.6 On Matrix Differential Calculus.- 2.2 Distance Measures for Patterns.- 2.2.1 Measures of Similarity and Distance in Vector Spaces.- 2.2.2 Measures of Similarity and Distance Between Symbol Strings.- 2.2.3 More Accurate Distance Measures for Text.- 3. Classical Learning Systems.- 3.1 The Adaptive Linear Element (Adaline).- 3.1.1 Description of Adaptation by the Stochastic Approximation.- 3.2 The Perceptron.- 3.3 The Learning Matrix.- 3.4 Physical Realization of Adaptive Weights.- 3.4.1 Perceptron and Adaline.- 3.4.2 Classical Conditioning.- 3.4.3 Conjunction Learning Switches.- 3.4.4 Digital Representation of Adaptive Circuits.- 3.4.5 Biological Components.- 4. A New Approach to Adaptive Filters.- 4.1 Survey of Some Necessary Functions.- 4.2 On the "Transfer Function" of the Neuron.- 4.3 Models for Basic Adaptive Units.- 4.3.1 On the Linearization of the Basic Unit.- 4.3.2 Various Cases of Adaptation Laws.- 4.3.3 Two Limit Theorems.- 4.3.4 The Novelty Detector.- 4.4 Adaptive Feedback Networks.- 4.4.1 The Autocorrelation Matrix Memory.- 4.4.2 The Novelty Filter.- 5. Self-Organizing Feature Maps.- 5.1 On the Feature Maps of the Brain.- 5.2 Formation of Localized Responses by Lateral Feedback.- 5.3 Computational Simplification of the Process.- 5.3.1 Definition of the Topology-Preserving Mapping.- 5.3.2 A Simple Two-Dimensional Self-Organizing System.- 5.4 Demonstrations of Simple Topology-Preserving Mappings.- 5.4.1 Images of Various Distributions of Input Vectors.- 5.4.2 "The Magic TV".- 5.4.3 Mapping by a Feeler Mechanism.- 5.5 Tonotopic Map.- 5.6 Formation of Hierarchical Representations.- 5.6.1 Taxonomy Example.- 5.6.2 Phoneme Map.- 5.7 Mathematical Treatment of Self-Organization.- 5.7.1 Ordering of Weights.- 5.7.2 Convergence Phase.- 5.8 Automatic Selection of Feature Dimensions.- 6. Optimal Associative Mappings.- 6.1 Transfer Function of an Associative Network.- 6.2 Autoassociative Recall as an Orthogonal Projection.- 6.2.1 Orthogonal Projections.- 6.2.2 Error-Correcting Properties of Projections.- 6.3 The Novelty Filter.- 6.3.1 Two Examples of Novelty Filter.- 6.3.2 Novelty Filter as an Autoassociative Memory.- 6.4 Autoassociative Encoding.- 6.4.1 An Example of Autoassociative Encoding.- 6.5 Optimal Associative Mappings.- 6.5.1 The Optimal Linear Associative Mapping.- 6.5.2 Optimal Nonlinear Associative Mappings.- 6.6 Relationship Between Associative Mapping, Linear Regression, and Linear Estimation.- 6.6.1 Relationship of the Associative Mapping to Linear Regression.- 6.6.2 Relationship of the Regression Solution to the Linear Estimator.- 6.7 Recursive Computation of the Optimal Associative Mapping.- 6.7.1 Linear Corrective Algorithms.- 6.7.2 Best Exact Solution (Gradient Projection).- 6.7.3 Best Approximate Solution (Regression).- 6.7.4 Recursive Solution in the General Case.- 6.8 Special Cases.- 6.8.1 The Correlation Matrix Memory.- 6.8.2 Relationship Between Conditional Averages and Optimal Estimator.- 7. Pattern Recognition.- 7.1 Discriminant Functions.- 7.2 Statistical Formulation of Pattern Classification.- 7.3 Comparison Methods.- 7.4 The Subspace Methods of Classification.- 7.4.1 The Basic Subspace Method.- 7.4.2 The Learning Subspace Method (LSM).- 7.5 Learning Vector Quantization.- 7.6 Feature Extraction.- 7.7 Clustering.- 7.7.1 Simple Clustering (Optimization Approach).- 7.7.2 Hierarchical Clustering (Taxonomy Approach).- 7.8 Structural Pattern Recognition Methods.- 8. More About Biological Memory.- 8.1 Physiological Foundations of Memory.- 8.1.1 On the Mechanisms of Memory in Biological Systems.- 8.1.2 Structural Features of Some Neural Networks.- 8.1.3 Functional Features of Neurons.- 8.1.4 Modelling of the Synaptic Plasticity.- 8.1.5 Can the Memory Capacity Ensue from Synaptic Changes?.- 8.2 The Unified Cortical Memory Model.- 8.2.1 The Laminar Network Organization.- 8.2.2 On the Roles of Interneurons.- 8.2.3 Representation of Knowledge Over Memory Fields.- 8.2.4 Self-Controlled Operation of Memory.- 8.3 Collateral Reading.- 8.3.1 Physiological Results Relevant to Modelling.- 8.3.2 Related Modelling.- 9. Notes on Neural Computing.- 9.1 First Theoretical Views of Neural Networks.- 9.2 Motives for the Neural Computing Research.- 9.3 What Could the Purpose of the Neural Networks be?.- 9.4 Definitions of Artificial "Neural Computing" and General Notes on Neural Modelling.- 9.5 Are the Biological Neural Functions Localized or Distributed?.- 9.6 Is Nonlinearity Essential to Neural Computing?.- 9.7 Characteristic Differences Between Neural and Digital Computers.- 9.7.1 The Degree of Parallelism of the Neural Networks is Still Higher than that of any "Massively Parallel" Digital Computer.- 9.7.2 Why the Neural Signals Cannot be Approximated by Boolean Variables.- 9.7.3 The Neural Circuits do not Implement Finite Automata.- 9.7.4 Undue Views of the Logic Equivalence of the Brain and Computers on a High Level.- 9.8 "Connectionist Models".- 9.9 How can the Neural Computers be Programmed?.- 10. Optical Associative Memories.- 10.1 Nonholographic Methods.- 10.2 General Aspects of Holographic Memories.- 10.3 A Simple Principle of Holographic Associative Memory.- 10.4 Addressing in Holographic Memories.- 10.5 Recent Advances of Optical Associative Memories.- Bibliography on Pattern Recognition.- References.

8,197 citations


"A vision architecture for unconstra..." refers methods in this paper

  • ...(10) Each wkmin(c)(rl) is updated based on the standard LVQ learning rule (Kohonen 1989), but is restricted to feature dimensions f ∈ Sc: w kmin(c) f := w kmin(c) f + µ Θ kmin(c)(rlf − w kmin(c) f ) ∀f ∈ Sc, (11) where µ = 1 if the categorization decision for rl was correct, otherwise µ = −1 and…...

    [...]

  • ...For our learning system a novel incremental category learning method is proposed that combines a learning vector quantization (LVQ) (Kohonen 1989) network to approach the “stability-plasticity dilemma” with a category-specific forward feature selection....

    [...]

Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations

Journal ArticleDOI
TL;DR: A neural network model for a mechanism of visual pattern recognition that is self-organized by “learning without a teacher”, and acquires an ability to recognize stimulus patterns based on the geometrical similarity of their shapes without affected by their positions.
Abstract: A neural network model for a mechanism of visual pattern recognition is proposed in this paper. The network is self-organized by “learning without a teacher”, and acquires an ability to recognize stimulus patterns based on the geometrical similarity (Gestalt) of their shapes without affected by their positions. This network is given a nickname “neocognitron”. After completion of self-organization, the network has a structure similar to the hierarchy model of the visual nervous system proposed by Hubel and Wiesel. The network consits of an input layer (photoreceptor array) followed by a cascade connection of a number of modular structures, each of which is composed of two layers of cells connected in a cascade. The first layer of each module consists of “S-cells”, which show characteristics similar to simple cells or lower order hypercomplex cells, and the second layer consists of “C-cells” similar to complex cells or higher order hypercomplex cells. The afferent synapses to each S-cell have plasticity and are modifiable. The network has an ability of unsupervised learning: We do not need any “teacher” during the process of self-organization, and it is only needed to present a set of stimulus patterns repeatedly to the input layer of the network. The network has been simulated on a digital computer. After repetitive presentation of a set of stimulus patterns, each stimulus pattern has become to elicit an output only from one of the C-cell of the last layer, and conversely, this C-cell has become selectively responsive only to that stimulus pattern. That is, none of the C-cells of the last layer responds to more than one stimulus pattern. The response of the C-cells of the last layer is not affected by the pattern's position at all. Neither is it affected by a small change in shape nor in size of the stimulus pattern.

4,713 citations


"A vision architecture for unconstra..." refers methods in this paper

  • ...We use a feed-forward feature extraction architecture inspired by the Neocognitron (Fukushima 1980) to extract shape features....

    [...]

Frequently Asked Questions (16)
Q1. What have the authors contributed in "A vision architecture for unconstrained and incremental learning of multiple categories" ?

The authors present an integrated vision architecture capable of incrementally learning several visual categories based on natural hand-held objects. The authors also impose no restrictions on the viewing angle of presented objects, relaxing the common constraint on canonical views. 

The forward feature selection method is used to find low dimensional subsets of category-specific features by predominately selecting features, which occur almost exclusively for a certain category. 

Due to the fact that the objects are presented by hand, skin color parts in the segment are systematic noise, which the authors remove from the initial foreground hypothesis based on the detection method proposed by Fritsch et al. (2002). 

The major drawback of those architectures commonly used for identification tasks is the inefficient separation of cooccuring categories. 

It seems to be that for their categorization task the indepen-dent representation of categories somehow weakens the forgetting effect of SLP networks. 

A common strategy for life-long learning architectures e.g. (Hamker, 2001; Furao & Hasegawa, 2006; Kirstein et al., 2008) is the usage of a node specific learning rate combined with an incremental node insertion rule. 

The advantages of these approaches are their robustness against partial occlusion, scale changes, and the ability to deal with cluttered environments. 

For color categories the effect of imprecise foreground masks on the categorization performance seems also to be only minor, otherwise the performance would be considerably lower. 

This allows that object views can be first used to test the STM and LTM representation and after providing confirmed labels the same views can also be used to enhance the representation by transferring them into the STM, even if they where recorded before the confirmation. 

Based on the currently available feature vectors, the learning methods are used to incorporate this STM knowledge into the LTM by applying the learning dynamics of the cLVQ method described in Section 4.2.3. 

To relax this separation and to make the most efficient use of object views, the authors introduce a sensory memory concept for temporarily remembering views of the currently attended object, by using the same one-shot learning method as used for the STM. 

(1)This computation of local edge responses is restricted to the positions in the foreground mask with ξi(x, y) > 0, whereas the ∗ denotes the inner product of two vectors. 

One of the essential problems when dealing with learning in unconstrained environments is the definition of a shared attention concept between the learning system and the human tutor. 

Additionally the hypothesis list is repeatedly communicated to the user (in 5 second intervals), while newly acquired segments are also used to refine this list. 

Additionally this constraint strongly reduces the appearance variations of the presented objects and therefore makes the category learning task much easier. 

The authors could show that their learning system can efficiently perform all necessary processing steps including figure-ground segregation, feature extraction and incremental learning.