scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2001"


Journal ArticleDOI
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Abstract: Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy.

7,413 citations


Journal ArticleDOI
Abstract: We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.

6,200 citations


Journal ArticleDOI
TL;DR: A generative appearance-based method for recognizing human faces under variation in lighting and viewpoint that exploits the fact that the set of images of an object in fixed pose but under all possible illumination conditions, is a convex cone in the space of images.
Abstract: We present a generative appearance-based method for recognizing human faces under variation in lighting and viewpoint. Our method exploits the fact that the set of images of an object in fixed pose, but under all possible illumination conditions, is a convex cone in the space of images. Using a small number of training images of each face taken with different lighting directions, the shape and albedo of the face can be reconstructed. In turn, this reconstruction serves as a generative model that can be used to render (or synthesize) images of the face under novel poses and illumination conditions. The pose space is then sampled and, for each pose, the corresponding illumination cone is approximated by a low-dimensional linear subspace whose basis vectors are estimated using the generative model. Our recognition algorithm assigns to a test image the identity of the closest approximated illumination cone. Test results show that the method performs almost without error, except on the most extreme lighting directions.

5,027 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.
Abstract: In the context of the appearance-based paradigm for object recognition, it is generally believed that algorithms based on LDA (linear discriminant analysis) are superior to those based on PCA (principal components analysis). In this communication, we show that this is not always the case. We present our case first by using intuitively plausible arguments and, then, by showing actual results on a face database. Our overall conclusion is that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.

3,102 citations


Journal ArticleDOI
TL;DR: A view-based approach to the representation and recognition of human movement is presented, and a recognition method matching temporal templates against stored instances of views of known actions is developed.
Abstract: A view-based approach to the representation and recognition of human movement is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: The first value is a binary value indicating the presence of motion and the second value is a function of the recency of motion in a sequence. We then develop a recognition method matching temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on standard platforms.

2,932 citations


Journal ArticleDOI
TL;DR: It is found that the technique of seeding a Fisher Projection with the results of sequential floating forward search improves the performance of the Fisher Projections and provides the highest recognition rates reported to date for classification of affect from physiology: 81 percent recognition accuracy on eight classes of emotion, including neutral.
Abstract: The ability to recognize emotion is one of the hallmarks of emotional intelligence, an aspect of human intelligence that has been argued to be even more important than mathematical and verbal intelligences. This paper proposes that machine intelligence needs to include emotional intelligence and demonstrates results toward this goal: developing a machine's ability to recognize the human affective state given four physiological signals. We describe difficult issues unique to obtaining reliable affective data and collect a large set of data from a subject trying to elicit and experience each of eight emotional states, daily, over multiple weeks. This paper presents and compares multiple algorithms for feature-based recognition of emotional state from this data. We analyze four physiological signals that exhibit problematic day-to-day variations: The features of different emotions on the same day tend to cluster more tightly than do the features of the same emotion on different days. To handle the daily variations, we propose new features and algorithms and compare their performance. We find that the technique of seeding a Fisher Projection with the results of sequential floating forward search improves the performance of the Fisher Projection and provides the highest recognition rates reported to date for classification of affect from physiology: 81 percent recognition accuracy on eight classes of emotion, including neutral.

2,172 citations


Journal ArticleDOI
TL;DR: SIMPLIcity (semantics-sensitive integrated matching for picture libraries), an image retrieval system, which uses semantics classification methods, a wavelet-based approach for feature extraction, and integrated region matching based upon image segmentation to improve retrieval.
Abstract: We present here SIMPLIcity (semantics-sensitive integrated matching for picture libraries), an image retrieval system, which uses semantics classification methods, a wavelet-based approach for feature extraction, and integrated region matching based upon image segmentation. An image is represented by a set of regions, roughly corresponding to objects, which are characterized by color, texture, shape, and location. The system classifies images into semantic categories. Potentially, the categorization enhances retrieval by permitting semantically-adaptive searching methods and narrowing down the searching range in a database. A measure for the overall similarity between images is developed using a region-matching scheme that integrates properties of all the regions in the images. The application of SIMPLIcity to several databases has demonstrated that our system performs significantly better and faster than existing ones. The system is fairly robust to image alterations.

2,117 citations


Journal ArticleDOI
TL;DR: An Automatic Face Analysis (AFA) system to analyze facial expressions based on both permanent facial features and transient facial features in a nearly frontal-view face image sequence and Multistate face and facial component models are proposed for tracking and modeling the various facial features.
Abstract: Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more often communicated by changes in one or a few discrete facial features. In this paper, we develop an automatic face analysis (AFA) system to analyze facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal-view face image sequence. The AFA system recognizes fine-grained changes in facial expression into action units (AU) of the Facial Action Coding System (FACS), instead of a few prototypic expressions. Multistate face and facial component models are proposed for tracking and modeling the various facial features, including lips, eyes, brows, cheeks, and furrows. During tracking, detailed parametric descriptions of the facial features are extracted. With these parameters as the inputs, a group of action units (neutral expression, six upper face AU and 10 lower face AU) are recognized whether they occur alone or in combinations. The system has achieved average recognition rates of 96.4 percent (95.4 percent if neutral expressions are excluded) for upper face AU and 96.7 percent (95.6 percent with neutral expressions excluded) for lower face AU. The generalizability of the system has been tested by using independent image databases collected and FACS-coded for ground-truth by different research teams.

1,773 citations


Journal ArticleDOI
TL;DR: The focus of this work is on spatial segmentation, where a criterion for "good" segmentation using the class-map is proposed and applying the criterion to local windows in theclass-map results in the "J-image," in which high and low values correspond to possible boundaries and interiors of color-texture regions.
Abstract: A method for unsupervised segmentation of color-texture regions in images and video is presented. This method, which we refer to as JSEG, consists of two independent steps: color quantization and spatial segmentation. In the first step, colors in the image are quantized to several representative classes that can be used to differentiate regions in the image. The image pixels are then replaced by their corresponding color class labels, thus forming a class-map of the image. The focus of this work is on spatial segmentation, where a criterion for "good" segmentation using the class-map is proposed. Applying the criterion to local windows in the class-map results in the "J-image," in which high and low values correspond to possible boundaries and interiors of color-texture regions. A region growing method is then used to segment the image based on the multiscale J-images. A similar approach is applied to video sequences. An additional region tracking scheme is embedded into the region growing process to achieve consistent segmentation and tracking results, even for scenes with nonrigid object motion. Experiments show the robustness of the JSEG algorithm on real images and video.

1,476 citations


Journal ArticleDOI
TL;DR: Results suggest that the improvement in performance is due to the component-based approach and the ACC data classification architecture, which is capable of locating partially occluded views of people and people whose body parts have little contrast with the background.
Abstract: We present a general example-based framework for detecting objects in static images by components. The technique is demonstrated by developing a system that locates people in cluttered scenes. The system is structured with four distinct example-based detectors that are trained to separately find the four components of the human body: the head, legs, left arm, and right arm. After ensuring that these components are present in the proper geometric configuration, a second example-based classifier combines the results of the component detectors to classify a pattern as either a "person" or a "nonperson." We call this type of hierarchical architecture, in which learning occurs at multiple stages, an adaptive combination of classifiers (ACC). We present results that show that this system performs significantly better than a similar full-body person detector. This suggests that the improvement in performance is due to the component-based approach and the ACC data classification architecture. The algorithm is also more robust than the full-body person detection method in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background.

1,115 citations


Journal ArticleDOI
TL;DR: In this article, a class-based image-based recognition and rendering with varying illumination has been proposed, based on a definition of an illumination invariant signature image which enables an analytic generation of the image space with different illumination conditions.
Abstract: The paper addresses the problem of "class-based" image-based recognition and rendering with varying illumination. The rendering problem is defined as follows: Given a single input image of an object and a sample of images with varying illumination conditions of other objects of the same general class, re-render the input image to simulate new illumination conditions. The class-based recognition problem is similarly defined: Given a single image of an object in a database of images of other objects, some of them multiply sampled under varying illumination, identify (match) any novel image of that object under varying illumination with the single image of that object in the database. We focus on Lambertian surface classes and, in particular, the class of human faces. The key result in our approach is based on a definition of an illumination invariant signature image which enables an analytic generation of the image space with varying illumination. We show that a small database of objects-in our experiments as few as two objects-is sufficient for generating the image space with varying illumination of any new object of the class from a single input image of that object. In many cases, the recognition results outperform by far conventional methods and the re-rendering is of remarkable quality considering the size of the database of example images and the mild preprocess required for making the algorithm work.

Journal ArticleDOI
TL;DR: A new probabilistic instantiation of this correlation framework is proposed and shown to deliver very good color constancy on both synthetic and real images, and is rich enough to allow many existing algorithms to be expressed within it.
Abstract: The paper considers the problem of illuminant estimation: how, given an image of a scene, recorded under an unknown light, we can recover an estimate of that light. Obtaining such an estimate is a central part of solving the color constancy problem. Thus, the work presented will have applications in fields such as color-based object recognition and digital photography. Rather than attempting to recover a single estimate of the illuminant, we instead set out to recover a measure of the likelihood that each of a set of possible illuminants was the scene illuminant. We begin by determining which image colors can occur (and how these colors are distributed) under each of a set of possible lights. We discuss how, for a given camera, we can obtain this knowledge. We then correlate this information with the colors in a particular image to obtain a measure of the likelihood that each of the possible lights was the scene illuminant. Finally, we use this likelihood information to choose a single light as an estimate of the scene illuminant. Computation is expressed and performed in a generic correlation framework which we develop. We propose a new probabilistic instantiation of this correlation framework and show that it delivers very good color constancy on both synthetic and real images. We further show that the proposed framework is rich enough to allow many existing algorithms to be expressed within it: the gray-world and gamut-mapping algorithms are presented in this framework and we also explore the relationship of these algorithms to other probabilistic and neural network approaches to color constancy.

Journal ArticleDOI
TL;DR: The Gaussian scale-space paradigm for color images is exploited to define a framework for the robust measurement of object reflectance from color images, and the proposed invariants are considered more adequate for the measurement of invariant color features than existing methods.
Abstract: This paper presents the measurement of colored object reflectance, under different, general assumptions regarding the imaging conditions. We exploit the Gaussian scale-space paradigm for color images to define a framework for the robust measurement of object reflectance from color images. Object reflectance is derived from a physical reflectance model based on the Kubelka-Munk theory for colorant layers. Illumination and geometrical invariant properties are derived from the reflectance model. Invariance and discriminative power of the color invariants is experimentally investigated, showing the invariants to be successful in discounting shadow, illumination, highlights, and noise. Extensive experiments show the different invariants to be highly discriminative, while maintaining invariance properties. The presented framework for color measurement is well-founded in the physics of color as well as in measurement science. Hence, the proposed invariants are considered more adequate for the measurement of invariant color features than existing methods.

Journal ArticleDOI
TL;DR: A system which takes as input a video stream obtained from an airborne moving platform and produces an analysis of the behavior of the moving objects in the scene and relies on two modular blocks to achieve this functionality.
Abstract: We present a system which takes as input a video stream obtained from an airborne moving platform and produces an analysis of the behavior of the moving objects in the scene. To achieve this functionality, our system relies on two modular blocks. The first one detects and tracks moving regions in the sequence. It uses a set of features at multiple scales to stabilize the image sequence, that is, to compensate for the motion of the observer, then extracts regions with residual motion and uses an attribute graph representation to infer their trajectories. The second module takes as input these trajectories, together with user-provided information in the form of geospatial context and goal context to instantiate likely scenarios. We present details of the system, together with results on a number of real video sequences and also provide a quantitative analysis of the results.

Journal ArticleDOI
TL;DR: A class of computationally inexpensive linear dimension reduction criteria is derived by introducing a weighted variant of the well-known K-class Fisher criterion associated with linear discriminant analysis (LDA).
Abstract: We derive a class of computationally inexpensive linear dimension reduction criteria by introducing a weighted variant of the well-known K-class Fisher criterion associated with linear discriminant analysis (LDA). It can be seen that LDA weights contributions of individual class pairs according to the Euclidean distance of the respective class means. We generalize upon LDA by introducing a different weighting function.

Journal ArticleDOI
TL;DR: A randomized tracking algorithm adapted from an existing probabilistic data association filter (PDAF) that is resistant to clutter and follows agile motion is introduced and a related technique that allows mixed tracker modalities and handles object overlaps robustly is derived.
Abstract: We describe a framework that explicitly reasons about data association to improve tracking performance in many difficult visual environments. A hierarchy of tracking strategies results from ascribing ambiguous or missing data to: 1) noise-like visual occurrences, 2) persistent, known scene elements (i.e., other tracked objects), or 3) persistent, unknown scene elements. First, we introduce a randomized tracking algorithm adapted from an existing probabilistic data association filter (PDAF) that is resistant to clutter and follows agile motion. The algorithm is applied to three different tracking modalities-homogeneous regions, textured regions, and snakes-and extensibly defined for straightforward inclusion of other methods. Second, we add the capacity to track multiple objects by adapting to vision a joint PDAF which oversees correspondence choices between same-modality trackers and image features. We then derive a related technique that allows mixed tracker modalities and handles object overlaps robustly. Finally, we represent complex objects as conjunctions of cues that are diverse both geometrically (e.g., parts) and qualitatively (e.g., attributes). Rigid and hinge constraints between part trackers and multiple descriptive attributes for individual parts render the whole object more distinctive, reducing susceptibility to mistracking. Results are given for diverse objects such as people, microscopic cells, and chess pieces.

Journal ArticleDOI
TL;DR: The widely used three-step edge detection procedure - gradient estimation, non-maxima suppression, hysteresis thresholding - is generalized to include the information provided by the confidence measure and experiments show the ability of the new procedure to detect weak edges.
Abstract: Computing the weighted average of the pixel values in a window is a basic module in many computer vision operators. The process is reformulated in a linear vector space and the role of the different subspaces is emphasized. Within this framework wellknown artifacts of the gradient-based edge detectors, such as large spurious responses can be explained quantitatively. It is also shown that template matching with a template derived from the input data is meaningful since it provides an independent measure of confidence in the presence of the employed edge model. The widely used three-step edge detection procedure - gradient estimation, non-maxima suppression, hysteresis thresholding - is generalized to include the information provided by the confidence measure. The additional amount of computation is minimal and experiments with several standard test images show the ability of the new procedure to detect weak edges.

Journal ArticleDOI
TL;DR: This work studies the motion correspondence problem for which a diversity of qualitative and statistical solutions exist, and presents a tracking algorithm that satisfies these-possibly constrained-models in a greedy matching sense, including an effective way to handle detection errors and occlusion.
Abstract: Studies the motion correspondence problem for which a diversity of qualitative and statistical solutions exist. We concentrate on qualitative modeling, especially in situations where assignment conflicts arise either because multiple features compete for one detected point or because multiple detected points fit a single feature point. We leave out the possibility of point track initiation and termination because that principally conflicts with allowing for temporary point occlusion. We introduce individual, combined, and global motion models and fit existing qualitative solutions in this framework. Additionally, we present a tracking algorithm that satisfies these-possibly constrained-models in a greedy matching sense, including an effective way to handle detection errors and occlusion. The performance evaluation shows that the proposed algorithm outperforms existing greedy matching algorithms. Finally, we describe an extension to the tracker that enables automatic initialization of the point tracks. Several experiments show that the extended algorithm is efficient, hardly sensitive to its few parameters, and qualitatively better than other algorithms, including the presumed optimal statistical multiple hypothesis tracker.

Journal ArticleDOI
TL;DR: In this article, a modified version of the K-means algorithm is proposed to cluster data, which adopts a novel nonmetric distance measure based on the idea of "point symmetry", which can be applied in data clustering and human face detection.
Abstract: We propose a modified version of the K-means algorithm to cluster data. The proposed algorithm adopts a novel nonmetric distance measure based on the idea of "point symmetry". This kind of "point symmetry distance" can be applied in data clustering and human face detection. Several data sets are used to illustrate its effectiveness.

Journal ArticleDOI
TL;DR: This paper describes an efficient algorithm for inexact graph matching that uses only the edge or connectivity structure of the graph and does not draw on node or edge attributes, and demonstrates that the method offers comparable performance to more computationally demanding methods.
Abstract: This paper describes an efficient algorithm for inexact graph matching. The method is purely structural, that is, it uses only the edge or connectivity structure of the graph and does not draw on node or edge attributes. We make two contributions: 1) commencing from a probability distribution for matching errors, we show how the problem of graph matching can be posed as maximum-likelihood estimation using the apparatus of the EM algorithm; and 2) we cast the recovery of correspondence matches between the graph nodes in a matrix framework. This allows one to efficiently recover correspondence matches using the singular value decomposition. We experiment with the method on both real-world and synthetic data. Here, we demonstrate that the method offers comparable performance to more computationally demanding methods.

Journal ArticleDOI
TL;DR: The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.
Abstract: Detecting faces in images with complex backgrounds is a difficult task. Our approach, which obtains state of the art results, is based on a neural network model: the constrained generative model (CGM). Generative, since the goal of the learning process is to evaluate the probability that the model has generated the input data, and constrained since some counter-examples are used to increase the quality of the estimation performed by the model. To detect side view faces and to decrease the number of false alarms, a conditional mixture of networks is used. To decrease the computational time cost, a fast search algorithm is proposed. The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.

Journal ArticleDOI
TL;DR: This work presents two possibilities for capturing omnistereo panoramas using optics without any moving parts, and introduces a special mirror such that viewing the scene through this mirror creates the same rays as those used with the rotating cameras.
Abstract: An omnistereo panorama consists of a pair of panoramic images, where one panorama is for the left eye and another panorama is for the right eye. The panoramic stereo pair provides a stereo sensation up to a full 360 degrees. Omnistereo panoramas can be constructed by mosaicing images from a single rotating camera. This approach also enables the control of stereo disparity, giving larger baselines for faraway scenes, and a smaller baseline for closer scenes. Capturing panoramic omnistereo images with a rotating camera makes it impossible to capture dynamic scenes at video rates and limits omnistereo imaging to stationary scenes. We present two possibilities for capturing omnistereo panoramas using optics without any moving parts. A special mirror is introduced such that viewing the scene through this mirror creates the same rays as those used with the rotating cameras. The lens used for omnistereo panorama is also introduced, together with the design of the mirror. Omnistereo panoramas can also be rendered by computer graphics methods to represent virtual environments.

Journal ArticleDOI
TL;DR: A fresh look is taken at the potential role of the holistic paradigm in handwritten word recognition and an attempt is made to interpret well-known paradigms of word recognition in this framework.
Abstract: The holistic paradigm in handwritten word recognition treats the word as a single, indivisible entity and attempts to recognize words from their overall shape, as opposed to their character contents. In this survey, we have attempted to take a fresh look at the potential role of the holistic paradigm in handwritten word recognition. The survey begins with an overview of studies of reading which provide evidence for the existence of a parallel holistic reading process,in both developing and skilled readers. In what we believe is a fresh perspective on handwriting recognition, approaches to recognition are characterized as forming a continuous spectrum based on the visual complexity of the unit of recognition employed and an attempt is made to interpret well-known paradigms of word recognition in this framework. An overview of features, methodologies, representations, and matching techniques employed by holistic approaches is presented.

Journal ArticleDOI
TL;DR: The Trace Transform as discussed by the authors is a generalization of the Radon transform, which consists of tracing an image with straight lines along which certain functionals of the image function are calculated.
Abstract: The Trace transform proposed, a generalization of the Radon transform, consists of tracing an image with straight lines along which certain functionals of the image function are calculated. Different functionals that can be used may be invariant to different transformations of the image. The paper presents the properties the functionals must have in order to be useful in three different applications of the method: construction of invariant features to rotation, translation and scaling of the image, construction of sensitive features to the parameters of rotation, translation and scaling of the image, and construction of features that may correlate well with a certain phenomenon we wish to monitor.

Journal ArticleDOI
TL;DR: This work extends the median concept to the domain of graphs and introduces the novel concepts of set median and generalized median of a set of graphs, and studies properties of both types of median graphs.
Abstract: In object prototype learning and similar tasks, median computation is an important technique for capturing the essential information of a given set of patterns. We extend the median concept to the domain of graphs. In terms of graph distance, we introduce the novel concepts of set median and generalized median of a set of graphs. We study properties of both types of median graphs. For the more complex task of computing generalized median graphs, a genetic search algorithm is developed. Experiments conducted on randomly generated graphs demonstrate the advantage of generalized median graphs compared to set median graphs and the ability of our genetic algorithm to find approximate generalized median graphs in reasonable time. Application examples with both synthetic and nonsynthetic data are shown to illustrate the practical usefulness of the concept of median graphs.

Journal ArticleDOI
TL;DR: A novel approach that reformulates Fisher's discriminant ratio to a quadratic optimization problem subject to a set of inequality constraints by combining statistical pattern recognition and support vector machines is proposed.
Abstract: A novel method for enhancing the performance of elastic graph matching in frontal face authentication is proposed. The starting point is to weigh the local similarity values at the nodes of an elastic graph according to their discriminatory power. Powerful and well-established optimization techniques are used to derive the weights of the linear combination. More specifically, we propose a novel approach that reformulates Fisher's discriminant ratio to a quadratic optimization problem subject to a set of inequality constraints by combining statistical pattern recognition and support vector machines (SVM). Both linear and nonlinear SVM are then constructed to yield the optimal separating hyperplanes and the optimal polynomial decision surfaces, respectively. The method has been applied to frontal face authentication on the M2VTS database. Experimental results indicate that the performance of morphological elastic graph matching is highly improved by using the proposed weighting technique.

Journal ArticleDOI
TL;DR: An error-tolerant subgraph isomorphism algorithm formulated in terms of region adjacency graphs, which allows matching computing under distorted inputs and also reaching a solution in a near polynomial time.
Abstract: We propose an error-tolerant subgraph isomorphism algorithm formulated in terms of region adjacency graphs (RAG). A set of edit operations to transform one RAG into another one are defined as regions are represented by polylines and string matching techniques are used to measure their similarity. The algorithm follows a branch and bound approach driven by the RAG edit operations. This formulation allows matching computing under distorted inputs and also reaching a solution in a near polynomial time. The algorithm has been used for recognizing symbols in hand drawn diagrams.

Journal ArticleDOI
TL;DR: Analytic comparison and experimental results show that the proposed lookahead improves the state-of-the-art in state-space search methods and that the combined use of the proposed matching and indexing scheme permits for the management of the complexity of a typical application of retrieval by spatial arrangement.
Abstract: In retrieval from image databases, evaluation of similarity, based both on the appearance of spatial entities and on their mutual relationships, depends on content representation based on attributed relational graphs. This kind of modeling entails complex matching and indexing, which presently prevents its usage within comprehensive applications. In this paper, we provide a graph-theoretical formulation for the problem of retrieval based on the joint similarity of individual entities and of their mutual relationships and we expound its implications on indexing and matching. In particular, we propose the usage of metric indexing to organize large archives of graph models, and we propose an original look-ahead method which represents an efficient solution for the (sub)graph error correcting isomorphism problem needed to compute object distances. Analytic comparison and experimental results show that the proposed lookahead improves the state-of-the-art in state-space search methods and that the combined use of the proposed matching and indexing scheme permits for the management of the complexity of a typical application of retrieval by spatial arrangement.

Journal ArticleDOI
TL;DR: A thresholding method that accounts for both intensity-based class uncertainty-a histogram-based property-and region homogeneity-an image morphology-basedproperty is introduced that was observed both qualitatively on clinical medical images and quantitatively on 250 realistic phantom images generated by adding different degrees of blurring, noise, and background variation to real objects segmented from clinical images.
Abstract: Thresholding is a popular image segmentation method that converts a gray-level image into a binary image. The selection of optimum thresholds has remained a challenge over decades. Besides being a segmentation tool on its own, often it is also a step in many advanced image segmentation techniques in spaces other than the image space. We introduce a thresholding method that accounts for both intensity-based class uncertainty-a histogram-based property-and region homogeneity-an image morphology-based property. A scale-based formulation is used for region homogeneity computation. At any threshold, intensity-based class uncertainty is computed by fitting a Gaussian to the intensity distribution of each of the two regions segmented at that threshold. The theory of the optimum thresholding method is based on the postulate that objects manifest themselves with fuzzy boundaries in any digital image acquired by an imaging device. The main idea here is to select that threshold at which pixels with high class uncertainty accumulate mostly around object boundaries. To achieve this, a threshold energy criterion is formulated using class-uncertainty and region homogeneity such that, at any image location, a high energy is created when both class uncertainty and region homogeneity are high or both are low. Finally, the method selects that threshold which corresponds to the minimum overall energy. The method has been compared to a maximum segmented image information method. Superiority of the proposed method was observed both qualitatively on clinical medical images as well as quantitatively on 250 realistic phantom images generated by adding different degrees of blurring, noise, and background variation to real objects segmented from clinical images.

Journal ArticleDOI
TL;DR: This work describes a system that detects and constructs 3D models for rectilinear buildings with either flat or symmetric gable roofs from multiple aerial images; the multiple images need not be stereo pairs (i.e., they may be acquired at different times).
Abstract: Automatic detection and description of cultural features, such as buildings, from aerial images is becoming increasingly important for a number of applications. This task also offers an excellent domain for studying the general problems of scene segmentation, 3D inference, and shape description under highly challenging conditions. We describe a system that detects and constructs 3D models for rectilinear buildings with either flat or symmetric gable roofs from multiple aerial images; the multiple images, however, need not be stereo pairs (i.e., they may be acquired at different times). Hypotheses for rectangular roof components are generated by grouping lines in the images hierarchically; the hypotheses are verified by searching for presence of predicted walls and shadows. The hypothesis generation process combines the tasks of hierarchical grouping with matching at successive stages. Overlap and containment relations between 3D structures are analyzed to resolve conflicts. This system has been tested on a large number of real examples with good results, some of which are included in the paper along with their evaluations.