scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 1996"


Book
17 Jul 1996
TL;DR: Object recognition: shape-based recognition what is recognition, why object recognition is difficult, and approaches to object recognition the alignment approach which is the correct approach?
Abstract: Object recognition: shape-based recognition what is recognition? why object recognition is difficult. Approaches to object recognition: invariant properties and feature spaces parts and structural descriptions the alignment approach which is the correct approach?. The alignment of pictorial descriptions: using corresponding features the use of multiple models for 3-D objects aligning pictorial descriptions transforming the image or the models? before and after alignment. The alignment of smooth bounding contours: the curvate method accuracy of the curvature method empirical testing. Recognition by the combination of views: modelling objects by view combinations objects with sharp edges using two views only using a single view the use of depth values summary of the basic scheme objects with smooth boundaries recognition by image combinations extensions to the view-combination scheme psychophysical and physiological evidence interim conclusions: recognition by multiple views. Classifications: classification and identification the role of object classification class-based processing using class prototypes pictorial classification evidence from psychology and biology are classes in the world or in our head? the organization of recognition memory. Image and model correspondence: feature correspondence contour matching correspondence-less methods correspondence processes in human vision model construction compensating for illumination changes. Segmentation and saliency: is segmentation feasible? bottom-up and top-down segmentation extracting globally salient structures saliency, selection, and completion what can bottom-up segmentation achieve? Visual cognition and visual routines: perceiving "inside" and "outside" spatial analysis by visual routines conclusions and open problems the elemental operations the assembly and storage of routines routines and recognition. Sequence seeking and counter streams - a model for visual cortex: the sequence-seeking scheme biological embodiment summary. Appendices: alignment by feature the curvature method errors of the curvature method locally affine matching definitions.

461 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new (to computer vision) experimental framework which allows us to make quantitative comparisons using subjective ratings made by people, which avoids the issue of pixel-level ground truth.
Abstract: The purpose of this paper is to describe a new (to computer vision) experimental framework which allows us to make quantitative comparisons using subjective ratings made by people. This approach avoids the issue of pixel-level ground truth. As a result, it does not allow us to make statements about the frequency of false positive and false negative errors at the pixel level. Instead, using experimental design and statistical techniques borrowed from Psychology, we make statements about whether the outputs of one edge detector are rated statistically significantly higher than the outputs of another. This approach offers itself as a nice complement to signal-based quantitative measures. Also, the evaluation paradigm in this paper is goal oriented; in particular, we consider edge detection in the context of object recognition. The human judges rate the edge, detectors based on how well the capture the salient features of real objects. So far, edge detection modules have been designed and evaluated in isolation, except for the recent work by Ramesh and Haralick (1992). The only prior work (that we are aware of) which also uses humans to rate image algorithms is that of Reeves and Higdon (1995). They use human ratings to decide on regularization parameters of image restoration. Fram and Deutch (1975) also used human subjects, however, the focus was on human versus machine performance rather than using human ratings to compare different edge detectors. The use of human judges to rate image outputs mist be approached systematically. Experiments must be designed and conducted carefully, and results interpreted with appropriate statistical tools. The use of statistical analysis in vision system performance characterization has been rare. The only prior work in the area that we are aware of is that of Nair et al. (1995), who used statistical ranking procedures to compare neural network based object recognition systems.

321 citations


BookDOI
01 Jan 1996
TL;DR: Redefining Stereo, Motion and Object Recognition via Ep bipolar Geometry via Epipolar Geometry, and Multiple Rigid Motions: Correspondence and Segmentation is redefined.
Abstract: Foreword Olivier Faugeras. Foreword Saburo Tsuji. Preface. 1. Introduction. 2. Camera Models and Epipolar Geometry. 3. Recovery of Epipolar Geometry From Points. 4. Recovery of Epipolar Geometry from Line Segments or Lines. 5. Redefining Stereo, Motion and Object Recognition via Epipolar Geometry. 6. Image Matching and Uncalibrated Stereo. 7. Multiple Rigid Motions: Correspondence and Segmentation. 8. 3D Object Recognition and Localization with Model Views. 9. Concluding Remarks. References. Index.

310 citations


Book ChapterDOI
16 Jul 1996
TL;DR: Two view-based object recognition algorithms are compared: a heuristic algorithm based on oriented filters, and a support vector learning machine trained on low-resolution images of the objects.
Abstract: Two view-based object recognition algorithms are compared: (1) a heuristic algorithm based on oriented filters, and (2) a support vector learning machine trained on low-resolution images of the objects. Classification performance is assessed using a high number of images generated by a computer graphics system under precisely controlled conditions. Training- and test-images show a set of 25 realistic three-dimensional models of chairs from viewing directions spread over the upper half of the viewing sphere. The percentage of correct identification of all 25 objects is measured.

217 citations


Journal ArticleDOI
TL;DR: The role of individual objects in the recognition of complete figures and the influence of contextual information on the identification of ambiguous objects were investigated and proper spatial relations among the objects of a scene decreased response times and error rates.
Abstract: In recognizing objects and scenes, partial recognition of objects or their parts can be used to guide the recognition of other objects. Here, the role of individual objects in the recognition of complete figures and the influence of contextual information on the identification of ambiguous objects were investigated. Configurations of objects that were placed in either proper or improper spatial relations were used, and response times and error rates in a recognition task were measured. Two main results were obtained. First, proper spatial relations among the objects of a scene decrease response times and error rates in the recognition of individual objects. Second, the presence of objects that have a unique interpretation improves the identification of ambiguous objects in the scene. Ambiguous objects were recognized faster and with fewer errors in the presence of clearly recognized objects compared with the same objects in isolation or in improper spatial relations. The implications of these findings for the organization of recognition memory are discussed.

213 citations


Proceedings ArticleDOI
25 Aug 1996
TL;DR: The method for histogram matching is extended to compute the probability of the presence of an object in an image and shows that receptive field histograms provide a technique for object recognition which is robust, has low computational cost and a computational complexity which is linear with the number of pixels.
Abstract: This paper describes a probabilistic object recognition technique which does not require correspondence matching of images This technique is an extension of our earlier work (1996) on object recognition using matching of multi-dimensional receptive field histograms In the earlier paper we have shown that multi-dimensional receptive field histograms can be matched to provide object recognition which is robust in the face of changes in viewing position and independent of image plane rotation and scale In this paper we extend this method to compute the probability of the presence of an object in an image The paper begins with a review of the method and previously presented experimental results We then extend the method for histogram matching to obtain a genuine probability of the presence of an object We present experimental results on a database of 100 objects showing that the approach is capable recognizing all objects correctly by using only a small portion of the image Our results show that receptive field histograms provide a technique for object recognition which is robust, has low computational cost and a computational complexity which is linear with the number of pixels

194 citations


Proceedings ArticleDOI
22 Apr 1996
TL;DR: A real-time vision system is described that can recognize 100 complex three-dimensional objects and its recognition rate was found to be 100% and object pose was estimated with a mean absolute error of 2.02 degrees and standard deviation of 1.67 degrees.
Abstract: A real-time vision system is described that can recognize 100 complex three-dimensional objects. In contrast to traditional strategies that rely on object geometry and local image features, the present system is founded on the concept of appearance matching. Appearance manifolds of the 100 objects were automatically learned using a computer-controlled turntable. The entire learning process was completed in 1 day. A recognition loop has been implemented that performs scene change detection, image segmentation, region normalizations, and appearance matching, in less than 1 second. The hardware used by the recognition system includes no more than a CCD color camera and a workstation. The real-time capability and interactive nature of the system have allowed numerous observers to test its performance. To quantify performance, we have conducted controlled experiments on recognition and pose estimation. The recognition rate was found to be 100% and object pose was estimated with a mean absolute error of 2.02 degrees and standard deviation of 1.67 degrees.

188 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new framework for recognizing planar object classes is presented, which is based on local feature detectors and a probabilistic model of the spatial arrangement of the features, and the allowed object deformations are represented through shape statistics, which are learned from examples.
Abstract: We present a new framework for recognizing planar object classes, which is based on local feature detectors and a probabilistic model of the spatial arrangement of the features. The allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by finding the appropriate features in the correct spatial configuration. The algorithm is robust with respect to partial occlusion, detector false alarms, and missed features. A 94% success rate was achieved for the problem of locating quasi-frontal views of faces in cluttered scenes.

128 citations


Proceedings ArticleDOI
25 Aug 1996
TL;DR: A correlation-based face recognition approach based on the analysis of maximum and minimum principal curvatures and their directions shows that shape information from surface curvatures provides vital clues in distinguishing and identifying such fine surface structure as human faces.
Abstract: In this paper, we present a correlation-based face recognition approach based on the analysis of maximum and minimum principal curvatures and their directions. We treat face recognition problem as a 3D shape recognition problem of free-form curved surfaces. Our approach is based on a 3D vector sets correlation method which does not require either face feature extraction or surface segmentation. Each face in both input images and the model database, is represented as an extended Gaussian image (EGI), constructed by mapping principal curvatures and their directions at each surface points, onto two unit spheres, each of which represents ridge and valley lines respectively. Individual face is then recognized by evaluating the similarities among others by using Fisher's spherical correlation on EGIs of faces. The method is tested for its simplicity and robustness and successively implemented for each of the face range images for NRCC (National Research Council of Canada) 3D image data files. Results shows that shape information from surface curvatures provides vital clues in distinguishing and identifying such fine surface structure as human faces.

100 citations


Journal ArticleDOI
TL;DR: A novel method based on the automatic search of features that characterize a certain object class using a training set consisting of both positive and negative examples, falling under the general problems of texture recognition, texture defect detection, and shape recognition.

93 citations


25 Oct 1996
TL;DR: Novel algorithms to automatically construct object-localization models from many images of the object are presented, and a consensus-search approach to determine which parts of the image justifiably constitute inclusion in the model is presented.
Abstract: : Being able to accurately estimate an object's pose (location) in an image is important for practical implementations and applications of object recognition. Recognition algorithms often trade off accuracy of the pose estimate for efficiency -- usually resulting in brittle and inaccurate recognition. One solution is object localization -- a local search for the object's true pose given a rough initial estimate of the pose. Localization is made difficult by the unfavorable characteristics (for example, noise, clutter, occlusion and missing data) of real images. In this thesis, we present novel algorithms for localizing 3D objects in 3D range-image data (3D-3D localization) and for localizing 3D objects in 2D intensity-image data (3D-2D localization). Our localization algorithms utilize robust statistical techniques to reduce the sensitivity of the algorithms to the noise, clutter, missing data, and occlusion which are common in real images. Our localization results demonstrate that our algorithms can accurately determine the pose in noisy, cluttered images despite significant errors in the initial pose estimate. Acquiring accurate object models that facilitate localization is also of great practical importance for object recognition. In the past, models for recognition and localization were typically created by hand using computer-aided design (CAD) tools. Manual modeling suffers from expense and accuracy limitations. In this thesis, we present novel algorithms to automatically construct object-localization models from many images of the object. We present a consensus-search approach to determine which parts of the image justifiably constitute inclusion in the model. Using this approach, our modeling algorithms are relatively insensitive to the imperfections and noise typical of real image data. Our results demonstrate that our modeling algorithms can construct very accurate geometric models from rather noisy input data.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: A comparison between an off-line and an on-line recognition system using the same databases and system design is presented, which uses a sliding window technique which avoids any segmentation before recognition.
Abstract: Off-line handwriting recognition has wider applications than on-line recognition, yet it seems to be a harder problem. While on-line recognition is based on pen trajectory data, off-line recognition has to rely on pixel data only. We present a comparison between an off-line and an on-line recognition system using the same databases and system design. Both systems use a sliding window technique which avoids any segmentation before recognition. The recognizer is a hybrid system containing a neural network and a hidden Markov model. New normalization and feature extraction techniques for the off-line recognition are presented, including a connectionist approach for non-linear core height estimation. Results for uppercase, cursive and mixed case word recognition are reported. Finally a system combining the on- and off-line recognition is presented.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: Current computer vision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms, but the system presented here achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters.
Abstract: Current computer vision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms. These systems are not robust for most real-world applications. In contrast, the system presented here achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters. This is accomplished by using the confidence level of model matching as a reinforcement signal for a team of learning automata to search for segmentation parameters during training. The use of the recognition algorithm as part of the evaluation function for image segmentation gives rise to significant improvement of the system performance by automatic generation of recognition strategies. The system is verified through experiments on sequences of color images with varying external conditions.

Journal ArticleDOI
TL;DR: It is shown that the motion of an object, when combined with information about the object and its normal uses, provides us with strong constraints on possible functions that the object might be performing.
Abstract: In order for a robot to operate autonomously in its environment, it must be able to perceive its environment and take actions based on these perceptions. Recognizing the functionalities of objects is an important component of this ability. In this paper, we look into a new area of functionality recognition: determining the function of an object from its motion. Given a sequence of images of a known object performing some function, we attempt to determine what that function is. We show that the motion of an object, when combined with information about the object and its normal uses, provides us with strong constraints on possible functions that the object might be performing.

Dissertation
03 Oct 1996
TL;DR: It is possible to account for human 3D shape recovery performance in constrained domains by positing the existence of a few biases towards specific kinds of shapes, and in a general setting, the perception of 3D shapes involves learning and is likely to be mediated by recognition processes.
Abstract: When presented with a single two-dimensional picture of a three-dimensional object, the human visual system is often able to: (1) interpret the 2D projection as a 3D shape, and (2) recognize the 3D object that produced the 2D image. How these twin tasks of perception and recognition are accomplished remains one of the most debated questions in the field of vision. This thesis examines this question in a few domains, both computationally and experimentally, and arrives at the following conclusions: (1) It is possible to account for human 3D shape recovery performance in constrained domains by positing the existence of a few biases towards specific kinds of shapes. (2) In a general setting, the perception of 3D shapes involves learning and is likely to be mediated by recognition processes. This conclusion runs counter to the traditional notion of a perception to recognition processing hierarchy. (3) The processes subserving the recognition of 3D shapes may use highly viewpoint-dependent internal representations for at least some classes of objects. (4) The memory requirements of view-dependent representation schemes can be greatly reduced by the use of quasi-invariants comprising sets of qualitative measurements. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

01 Dec 1996
TL;DR: A new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features, and uses a large set of complex features that are learned from experience with model objects.
Abstract: We have developed a new Bayesian framework for visual object recognition which is based on the insight that images of objects can be modeled as a conjunction of local features. This framework can be used to both derive an object recognition algorithm and an algorithm for learning the features themselves. The overall approach, called complex feature recognition or CFR, is unique for several reasons: it is broadly applicable to a wide range of object types, it makes constructing object models easy, it is capable of identifying either the class or the identity of an object, and it is computationally efficient--requiring time proportional to the size of the image. Instead of a single simple feature such as an edge, CFR uses a large set of complex features that are learned from experience with model objects. The response of a single complex feature contains much more class information than does a single edge. This significantly reduces the number of possible correspondences between the model and the image. In addition, CFR takes advantage of a type of image processing called "oriented energy". Oriented energy is used to efficiently pre-process the image to eliminate some of the difficulties associated with changes in lighting and pose.

Book ChapterDOI
15 Apr 1996
TL;DR: A testbed for automatic face recognition shows an eigenface coding of shape-free texture, with manually coded landmarks, was more effective than correctly shaped faces, being dependent upon high-quality representation of the facial variation by a shape- free ensemble.
Abstract: A testbed for automatic face recognition shows an eigenface coding of shape-free texture, with manually coded landmarks, was more effective than correctly shaped faces, being dependent upon high-quality representation of the facial variation by a shape-free ensemble. Configuration also allowed recognition, these measures combine to improve performance and allowed automatic measurement of the face-shape. Caricaturing further increased performance. Correlation of contours of shapefree images also increased recognition, suggesting extra information was available. A natural model considers faces as in a manifold, linearly approximated by the two factors, with a separate system for local features.

Book ChapterDOI
13 Apr 1996
TL;DR: Experiments show the method capable of learning to recognize complex objects in cluttered images, acquiring models that represent those objects using relatively few views.
Abstract: We describe how to model the appearance of an object using multiple views, learn such a model from training images, and recognize objects with it The model uses probability distributions to characterize the significance, position, and intrinsic measurements of various discrete features of appearance; it also describes topological relations among features The features and their distributions are learned from training images depicting the modeled object A matching procedure, combining qualities of both alignment and graph subisomorphism methods, uses feature uncertainty information recorded by the model to guide the search for a match between model and image Experiments show the method capable of learning to recognize complex objects in cluttered images, acquiring models that represent those objects using relatively few views


Patent
Daiki Masumoto1
06 Feb 1996
TL;DR: In this article, a system for recognizing an object state (the kind, position, attitude) gets observation object information and recognizes a naturally occurring object for which simple rules cannot be found.
Abstract: A system for recognizing an object state (the kind, position, attitude) gets observation object information and recognizes a naturally occurring object for which simple rules cannot be found. This system outputs a state prediction value for a targeted object from cognitive object observation data, an observation data prediction value for targeted partial features according to this output, a partial feature prediction position in targeted observation results according to a state evaluation value output, and recognizes the object by modifying an output of the state prediction value output unit. Many naturally occurring objects, for which simple rules cannot be found, can be modeled, and a recognition system, which can be used in more realistic circumstances, can be constructed. An object based vision and an observer based vision as a 3D object model representation method is provided, and learning patterns necessary for recognizing are obtained effectively by using a visible-invisible determination module in automatically obtaining the 3D object model.

Proceedings ArticleDOI
07 May 1996
TL;DR: By conducting writer-dependent recognition experiments, it is demonstrated that the recognition rates as well as the reliability of the results is improved by using the proposed recognition system.
Abstract: This paper addresses the problem of recognizing on-line sampled handwritten symbols. Within the proposed symbol recognition system based on hidden Markov models different kinds of feature extraction algorithms are used analysing on-line features as well as off-line features and combining the classification results. By conducting writer-dependent recognition experiments, it is demonstrated that the recognition rates as well as the reliability of the results is improved by using the proposed recognition system. Furthermore, by applying handwriting data not representing symbols out of the given alphabet, an increase of their rejection rate is obtained.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: This work proposes a methodology for the generation of learning samples in appearance-based object recognition that learns object models from a large number of generated samples derived from a small number of actually observed images.
Abstract: We propose a methodology for the generation of learning samples in appearance-based object recognition. In many practical situations, it is not easy to obtain a large number of learning samples. The proposed method learns object models from a large number of generated samples derived from a small number of actually observed images. The learning algorithm has two steps: 1) generation of a large number of images by image interpolation, or image deformation, and 2) compression of the large sample sets using parametric eigenspace representation. We compare our method with the previous methods that interpolate sample points in eigenspace, and show the performance of our method to be superior. Experiments were conducted for 432 image samples for 4 objects to demonstrate the effectiveness of the method.

Journal ArticleDOI
TL;DR: The results suggest that human object recognition (as opposed to face recognition) may be difficult to approximate by models that do not posit hidden units for explicit representation of intermediate entities such as edges, viewpoint invariant classifiers, axes, shocks and object parts.
Abstract: A number of recent successful models of face recognition posit only two layers, an input layer consisting of a lattice of spatial filters and a single subsequent stage by which those descriptor values are mapped directly onto an object representation layer by standard matching methods such as stochastic optimization. Is this approach sufficient for modeling human object recognition? We tested whether a highly efficient version of such a two-layer model would manifest effects similar to those shown by humans when given the task of recognizing images of objects that had been employed in a series of psychophysical experiments. System accuracy was quite high overall, but was qualitatively different from that evidenced by humans in object recognition tasks. The discrepancy between the system's performance and human performance is likely to be revealed by all models that map filter values directly onto object units. These results suggest that human object recognition (as opposed to face recognition) may be difficult to approximate by models that do not posit hidden units for explicit representation of intermediate entities such as edges, viewpoint invariant classifiers, axes, shocks and object parts.

Book ChapterDOI
16 Jul 1996
TL;DR: The advantages of both paradigms are combined: in the low level field adaptivity and the ability to learn from examples is realized by a neural network, whereas the high level analysis is performed by representing structured knowledge in a semantic network.
Abstract: We present an architecture for 3D-object recognition based on the integration of neural and semantic networks. The architecture consists of mainly two components. A neural object recognition system generates object hypotheses, which are verified or rejected by a semantic network. Thus the advantages of both paradigms are combined: in the low level field adaptivity and the ability to learn from examples is realized by a neural network, whereas the high level analysis is performed by representing structured knowledge in a semantic network.

Proceedings ArticleDOI
04 Nov 1996
TL;DR: This paper describes the statistical framework from which a network of salient points for an object is obtained and may be used for fixation control in the context of active object recognition.
Abstract: The authors (1996) introduced the use of multidimensional receptive field histograms for probabilistic object recognition. In this paper we reverse the object recognition problem by asking the question "where should we look?", when we want to verify the presence of an object, to track an object or to actively explore a scene. This paper describes the statistical framework from which we obtain a network of salient points for an object. This network of salient points may be used for fixation control in the context of active object recognition.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: It is shown that a multiple-view approach to 3D recognition basing upon spatially tolerant contour representations is both straight and tractable and discrimination experiments demonstrate the system's ability to separate similar objects.
Abstract: We show that a multiple-view approach to 3D recognition basing upon spatially tolerant contour representations is both straight and tractable. The approach is straight with respect to the ease in extending an existing distance- and orientation-invariant 2D robot-vision-system to a 3D recognition system. It is tractable with respect to the number of required views of an object to enable fast and reliable recognition from arbitrary viewpoints. Depending on the choice of system parameters 20-30 views of an object with relatively high structural complexity (e.g. an aeroplane) must be learnt. Small sized sets of prototypical views can be generated by means of a simple and efficient heuristic procedure. For fast recognition and pose estimation usually a single view is sufficient. Discrimination experiments demonstrate the system's ability to separate similar objects. An object's pose can be estimated using a vector of selected similarity measures as input for a pose estimator.

Book ChapterDOI
13 Apr 1996
TL;DR: An experimental investigation of the recognition performance of two approaches to the representation of objects for recognition by constructing an eigenvector space to compute efficiently the distance between a new image and any image in the database.
Abstract: This paper describes an experimental investigation of the recognition performance of two approaches to the representation of objects for recognition. The first representation, generally known as appearance modelling, describes an object by a set of images. The image set is acquired for a range of views and illumination conditions which are expected to be encountered in subsequent recognition. This image database provides a description of the object. Recognition is carried out by constructing an eigenvector space to compute efficiently the distance between a new image and any image in the database. The second representation is a geometric description based on the projected boundary of an object. General object classes such as planar objects, surfaces of revolution and repeated structures support the construction of invariant descriptions and invariant index functions for recognition.

Book ChapterDOI
15 Apr 1996
TL;DR: A new recognition method that uses a subspace representation to approximate the comparison of binary images (e.g. intensity edges) using the Hausdorff fraction is described, which enables an image to be searched efficiently for any of the objects in an image database.
Abstract: In this paper we describe a new recognition method that uses a subspace representation to approximate the comparison of binary images (e.g. intensity edges) using the Hausdorff fraction. The technique is robust to outliers and occlusion, and thus can be used for recognizing objects that are partly hidden from view and occur in cluttered backgrounds. We report some simple recognition experiments in which novel views of objects are classified using both a standard SSD-based eigenspace method and our Hausdorff-based method. These experiments illustrate how our method performs better when the background is unknown or the object is partially occluded. We then consider incorporating the method into an image search engine, for locating instances of objects under translation in an image. Results indicate that all but a small percentage of image locations can be ruled out using the eigenspace, without eliminating correct matches. This enables an image to be searched efficiently for any of the objects in an image database.


Proceedings ArticleDOI
25 Aug 1996
TL;DR: This MOROFA (moving object recognition by optical flow analysis) method can be applied to many industrial areas; for example, an intelligent machine surveillance system or an obstacle detection system for an autonomous vehicle.
Abstract: This paper presents a new method which can effectively recognize moving objects by analyzing optical flow information acquired from dynamic images. This MOROFA (moving object recognition by optical flow analysis) method can be applied to many industrial areas; for example, an intelligent machine surveillance system or an obstacle detection system for an autonomous vehicle. At first, the optical flow field is detected in image sequences from a camera on a moving observer and moving object candidates are extracted by using the residual error value that is calculated in the process of estimating the focus of expansion. Next, the optical flow directions and intensity values are stored for the pixels involved in each candidate region to calculate the directions and the proportion values of the principal components. Finally, each candidate is classified into a category of object that is expected to appear in the scene by comparing the direction and the proportion values with standard data ranges for the objects which are determined by preliminary experiments. Experimental results of real outdoor scenes have shown the effectiveness of the proposed method.