scispace - formally typeset
Search or ask a question
Author

Chad Carson

Other affiliations: University of California
Bio: Chad Carson is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Image retrieval & Automatic image annotation. The author has an hindex of 9, co-authored 13 publications receiving 3694 citations. Previous affiliations of Chad Carson include University of California.

Papers
More filters
Journal ArticleDOI
TL;DR: Results indicating that querying for images using Blobworld produces higher precision than does querying using color and texture histograms of the entire image in cases where the image contains distinctive objects are presented.
Abstract: Retrieving images from large and varied collections using image content as a key is a challenging and important problem We present a new image representation that provides a transformation from the raw pixel data to a small set of image regions that are coherent in color and texture This "Blobworld" representation is created by clustering pixels in a joint color-texture-position feature space The segmentation algorithm is fully automatic and has been run on a collection of 10,000 natural images We describe a system that uses the Blobworld representation to retrieve images from this collection An important aspect of the system is that the user is allowed to view the internal representation of the submitted image and the query results Similar systems do not offer the user this view into the workings of the system; consequently, query results from these systems can be inexplicable, despite the availability of knobs for adjusting the similarity metrics By finding image regions that roughly correspond to objects, we allow querying at the level of objects rather than global image properties We present results indicating that querying for images using Blobworld produces higher precision than does querying using color and texture histograms of the entire image in cases where the image contains distinctive objects

1,574 citations

Book ChapterDOI
TL;DR: This work indexes the blob descriptions using a lower-rank approximation to the high-dimensional distance to make large-scale retrieval feasible, and shows encouraging results for both querying and indexing.
Abstract: Blobworld is a system for image retrieval based on finding coherent image regions which roughly correspond to objects. Each image is automatically segmented into regions ("blobs") with associated color and texture descriptors. Queryingi s based on the attributes of one or two regions of interest, rather than a description of the entire image. In order to make large-scale retrieval feasible, we index the blob descriptions usinga tree. Because indexing in the high-dimensional feature space is computationally prohibitive, we use a lower-rank approximation to the high-dimensional distance. Experiments show encouraging results for both queryinga nd indexing.

896 citations

Proceedings ArticleDOI
04 Jan 1998
TL;DR: A new image representation is presented which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space based on segmentation using the expectation-maximization algorithm on combined color andtexture features.
Abstract: Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper we present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. This so-called "blobworld" representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features. The texture features we use for the segmentation arise from a new approach to texture description and scale selection. We describe a system that uses the blobworld representation to retrieve images. An important and unique aspect of the system is that, in the context of similarity-based querying, the user is allowed to view the internal representation of the submitted image and the query results. Similar systems do not offer the user this view into the workings of the system; consequently, the outcome of many queries on these systems can be quite inexplicable, despite the availability of knobs for adjusting the similarity metric.

548 citations

Proceedings ArticleDOI
20 Jun 1997
TL;DR: A new image representation is presented which provides a transformation from the raw pixel data to a small set of localized coherent regions in color and texture space based on segmentation using the expectation-maximization algorithm on combined color andtexture features.
Abstract: Retrieving images from large and varied collections using image content as a key is a challenging and important problem In this paper, we present a new image representation which provides a transformation from the raw pixel data to a small set of localized coherent regions in color and texture space This so-called lblobworldr representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features The texture features we use for the segmentation arise from a new approach to texture description and scale selection We describe a system that uses the blobworld representation to retrieve images An important and unique aspect of the system is that, in the context of similarity-based querying, the user is allowed to view the internal representation of the submitted image and the query results Similar systems do not offer the user this view into the workings of the system; consequently, the outcome of many queries on these systems can be quite inexplicable, despite the availability of knobs for adjusting the similarity metric

318 citations

Book ChapterDOI
13 Apr 1996
TL;DR: The approach to object recognition is described, which is structured around a sequence of increasingly specialized grouping activities that assemble coherent regions of image that can be shown to satisfy increasingly stringent constraints.
Abstract: Retrieving images from very large collections, using image content as a key, is becoming an important problem. Users prefer to ask for pictures using notions of content that are strongly oriented to the presence of abstractly defined objects. Computer programs that implement these queries automatically are desirable, but are hard to build because conventional object recognition techniques from computer vision cannot recognize very general objects in very general contexts. This paper describes our approach to object recognition, which is structured around a sequence of increasingly specialized grouping activities that assemble coherent regions of image that can be shown to satisfy increasingly stringent constraints. The constraints that are satisfied provide a form of object classification in quite general contexts. This view of recognition is distinguished by: far richer involvement of early visual primitives, including color and texture; hierarchical grouping and learning strategies in the classification process; the ability to deal with rather general objects in uncontrolled configurations and contexts. We illustrate these properties with four case-studies: one demonstrating the use of color and texture descriptors; one showing how trees can be described by fusing texture and geometric properties; one learning scenery concepts using grouped features; and one showing how this view of recognition yields a program that can tell, quite accurately, whether a picture contains naked people or not.

219 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

6,882 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

6,505 citations

Journal ArticleDOI
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

6,447 citations

Journal ArticleDOI
TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.
Abstract: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

5,068 citations

Journal ArticleDOI
TL;DR: This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances.
Abstract: We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

4,593 citations