Showing papers by "Andrew Zisserman published in 2005"

PDF

Open Access

Journal Article•DOI•

[...]

Krystian Mikolajczyk¹, Tinne Tuytelaars², Cordelia Schmid³, Andrew Zisserman¹, Jiri Matas⁴, Frederik Schaffalitzky¹, Timor Kadir¹, L. Van Gool² - Show less +4 more•Institutions (4)

University of Oxford¹, Katholieke Universiteit Leuven², French Institute for Research in Computer Science and Automation³, Czech Technical University in Prague⁴

01 Nov 2005-International Journal of Computer Vision

TL;DR: A snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions to establish a reference test set of images and performance software so that future detectors can be evaluated in the same framework.

...read moreread less

Abstract: The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of `maximally stable extremal regions', proposed by Matas et al. (2002); an edge-based region detector (Tuytelaars and Van Gool, 1999) and a detector based on intensity extrema (Tuytelaars and Van Gool, 2000), and a detector of `salient regions', proposed by Kadir, Zisserman and Brady (2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.

...read moreread less

3,359 citations

Journal Article•DOI•

A Statistical Approach to Texture Classification from Single Images

[...]

Manik Varma¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Apr 2005-International Journal of Computer Vision

TL;DR: A method of reliably measuring relative orientation co-occurrence statistics in a rotationally invariant manner is presented, and whether incorporating such information can enhance the classifier’s performance is discussed.

...read moreread less

Abstract: We investigate texture classification from single images obtained under unknown viewpoint and illumination. A statistical approach is developed where textures are modelled by the joint probability distribution of filter responses. This distribution is represented by the frequency histogram of filter response cluster centres (textons). Recognition proceeds from single, uncalibrated images and the novelty here is that rotationally invariant filters are used and the filter response space is low dimensional.

...read moreread less

1,145 citations

Proceedings Article•DOI•

Discovering objects and their location in images

[...]

Josef Sivic¹, Bryan Russell², Alexei A. Efros³, Andrew Zisserman¹, William T. Freeman² - Show less +1 more•Institutions (3)

University of Oxford¹, Massachusetts Institute of Technology², Carnegie Mellon University³

17 Oct 2005

TL;DR: This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).

...read moreread less

Abstract: We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. (2003) on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include 'doublets' which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.

...read moreread less

1,129 citations

Proceedings Article•DOI•

Learning object categories from Google's image search

[...]

Rob Fergus¹, Li Fei-Fei², Pietro Perona², Andrew Zisserman¹•Institutions (2)

University of Oxford¹, California Institute of Technology²

17 Oct 2005

TL;DR: A new model, TSI-pLSA, is developed, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner, and can handle the high intra-class variability and large proportion of unrelated images returned by search engines.

...read moreread less

Abstract: Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate tire models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets

...read moreread less

807 citations

Discovering object categories in image collections

[...]

Josef Sivic¹, Bryan Russell¹, Alexei A. Efros², Andrew Zisserman², William T. Freeman³ - Show less +1 more•Institutions (3)

University of Oxford¹, Massachusetts Institute of Technology², Carnegie Mellon University³

25 Feb 2005

TL;DR: Given a set of images containing multiple object categories, this work seeks to discover those categories and their image locations without supervision using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA).

...read moreread less

Abstract: Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of [7].1 1This work was sponsored in part by the EU Project CogViSys, the University of Oxford, Shell Oil, and the National Geospatial-Intelligence Agency.

...read moreread less

524 citations

Proceedings Article•DOI•

Obj cut

[...]

M.P. Kumar¹, P.H.S. Ton¹, Andrew Zisserman²•Institutions (2)

Oxford Brookes University¹, University of Oxford²

20 Jun 2005

TL;DR: A principled Bayesian method for detecting and segmenting instances of a particular object category within an image, providing a coherent methodology for combining top down and bottom up cues and developing an efficient method, OBJ CUT, to obtain segmentations using this model.

...read moreread less

Abstract: In this paper, we present a principled Bayesian method for detecting and segmenting instances of a particular object category within an image, providing a coherent methodology for combining top down and bottom up cues. The work draws together two powerful formulations: pictorial structures (PS) and Markov random fields (MRFs) both of which have efficient algorithms for their solution. The resulting combination, which we call the object category specific MRF, suggests a solution to the problem that has long dogged MRFs namely that they provide a poor prior for specific shapes. In contrast, our model provides a prior that is global across the image plane using the PS. We develop an efficient method, OBJ CUT, to obtain segmentations using this model. Novel aspects of this method include an efficient algorithm for sampling the PS model, and the observation that the expected log likelihood of the model can be increased by a single graph cut. Results are presented on two object categories, cows and horses. We compare our methods to the state of the art in object category specific image segmentation and demonstrate significant improvements.

...read moreread less

386 citations

Book Chapter•DOI•

The 2005 PASCAL visual object classes challenge

[...]

Mark Everingham¹, Andrew Zisserman¹, Christopher Williams², Luc Van Gool, Moray Allan², Christopher M. Bishop³, Olivier Chapelle⁴, Navneet Dalal⁵, Thomas Deselaers⁶, Gyuri Dorkó⁵, Stefan Duffner⁷, J Eichhorn⁴, Jason Farquhar⁸, Mario Fritz⁹, Christophe Garcia⁷, Tom Griffiths², Frédéric Jurie⁵, Daniel Keysers⁶, Markus Koskela¹⁰, Jorma Laaksonen¹⁰, Diane Larlus⁵, Bastian Leibe⁹, Hongying Meng⁸, Hermann Ney⁶, Bernt Schiele⁹, Cordelia Schmid⁵, Edgar Seemann⁹, John Shawe-Taylor⁸, Amos Storkey², Sandor Szedmak⁸, Bill Triggs⁵, Ilkay Ulusoy¹¹, Ville Viitaniemi¹⁰, Jianguo Zhang⁵ - Show less +30 more•Institutions (11)

University of Oxford¹, University of Edinburgh², Microsoft³, Max Planck Society⁴, French Institute for Research in Computer Science and Automation⁵, RWTH Aachen University⁶, Orange S.A.⁷, University of Southampton⁸, Technische Universität Darmstadt⁹, Helsinki University of Technology¹⁰, Middle East Technical University¹¹

11 Apr 2005

TL;DR: The PASCAL Visual Object Classes Challenge (PASCALVOC) as mentioned in this paper was held from February to March 2005 to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects).

...read moreread less

Abstract: The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

...read moreread less

381 citations

Proceedings Article•DOI•

Strike a pose: tracking people by finding stylized poses

[...]

Deva Ramanan¹, David Forsyth¹, Andrew Zisserman²•Institutions (2)

University of California, Berkeley¹, University of Oxford²

20 Jun 2005

TL;DR: A person detector that quite accurately detects and localizes limbs of people in lateral walking poses is built, and an algorithm for finding and kinematically tracking multiple people in long sequences is developed.

...read moreread less

Abstract: We develop an algorithm for finding and kinematically tracking multiple people in long sequences. Our basic assumption is that people tend to take on certain canonical poses, even when performing unusual activities like throwing a baseball or figure skating. We build a person detector that quite accurately detects and localizes limbs of people in lateral walking poses. We use the estimated limbs from a detection to build a discriminative appearance model; we assume the features that discriminate a figure in one frame will discriminate the figure in other frames. We then use the models as limb detectors in a pictorial structure framework, detecting figures in unrestricted poses in both previous and successive frames. We have run our tracker on hundreds of thousands of frames, and present and apply a methodology for evaluating tracking on such a large scale. We test our tracker on real sequences including a feature-length film, an hour of footage from a public park, and various sports sequences. We find that we can quite accurately automatically find and track multiple people interacting with each other while performing fast and unusual motions.

...read moreread less

364 citations

Proceedings Article•DOI•

A sparse object category model for efficient learning and exhaustive recognition

[...]

Rob Fergus¹, Pietro Perona², Andrew Zisserman¹•Institutions (2)

University of Oxford¹, California Institute of Technology²

20 Jun 2005

TL;DR: A "parts and structure" model for object category recognition that can be learnt efficiently and in a semi-supervised manner is presented, learnt from example images containing category instances, without requiring segmentation from background clutter.

...read moreread less

Abstract: We present a "parts and structure" model for object category recognition that can be learnt efficiently and in a semi-supervised manner: the model is learnt from example images containing category instances, without requiring segmentation from background clutter. The model is a sparse representation of the object, and consists of a star topology configuration of parts modeling the output of a variety of feature detectors. The optimal choice of feature types (whose repertoire includes interest points, curves and regions) is made automatically. In recognition, the model may be applied efficiently in an exhaustive manner, bypassing the need for feature detectors, to give the globally optimal match within a query image. The approach is demonstrated on a wide variety of categories, and delivers both successful classification and localization of the object within the image.

...read moreread less

333 citations

Book Chapter•DOI•

Person spotting: video shot retrieval for face sets

[...]

Josef Sivic¹, Mark Everingham¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

20 Jul 2005

TL;DR: Progress is described in harnessing multiple exemplars of each person in a form that can easily be associated automatically using straightforward visual tracking in order to retrieve humans automatically in videos, given a query face in a shot.

...read moreread less

Abstract: Matching people based on their imaged face is hard because of the well known problems of illumination, pose, size and expression variation. Indeed these variations can exceed those due to identity. Fortunately, videos of people have the happy benefit of containing multiple exemplars of each person in a form that can easily be associated automatically using straightforward visual tracking. We describe progress in harnessing these multiple exemplars in order to retrieve humans automatically in videos, given a query face in a shot. There are three areas of interest: (i) the matching of sets of exemplars provided by “tubes” of the spatial-temporal volume; (ii) the description of the face using a spatial orientation field; and, (iii) the structuring of the problem so that retrieval is immediate at run time. The result is a person retrieval system, able to retrieve a ranked list of shots containing a particular person in the manner of Google. The method has been implemented and tested on two feature length movies.

...read moreread less

243 citations

Proceedings Article•DOI•

Automatic face recognition for film character retrieval in feature-length films

[...]

Ognjen Arandjelovic¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

20 Jun 2005

TL;DR: It is demonstrated that high recall rates can be achieved whilst maintaining good precision (over 93%) and a recognition method based on a cascade of processing steps that normalize for the effects of the changing imaging environment is developed.

...read moreread less

Abstract: The objective of this work is to recognize all the frontal faces of a character in the closed world of a movie or situation comedy, given a small number of query faces. This is challenging because faces in a feature-length film are relatively uncontrolled with a wide variability of scale, pose, illumination, and expressions, and also may be partially occluded. We develop a recognition method based on a cascade of processing steps that normalize for the effects of the changing imaging environment. In particular there are three areas of novelty: (i) we suppress the background surrounding the face, enabling the maximum area of the face to be retained for recognition rather than a subset; (ii) we include a pose refinement step to optimize the registration between the test image and face exemplar; and (iii) we use robust distance to a sub-space to allow for partial occlusion and expression change. The method is applied and evaluated on several feature length films. It is demonstrated that high recall rates (over 92%) can be achieved whilst maintaining good precision (over 93%).

...read moreread less

Proceedings Article•DOI•

Learning layered motion segmentations of video

[...]

M.P. Kumar¹, Philip H. S. Torr¹, Andrew Zisserman²•Institutions (2)

Oxford Brookes University¹, University of Oxford²

17 Oct 2005

TL;DR: An unsupervised approach for learning a generative layered representation of a scene from a video for motion segmentation using efficient loopy belief propagation and /spl alpha//spl beta/-swap and / spl alpha/-expansion algorithms for refining the initial estimate.

...read moreread less

Abstract: We present an unsupervised approach for learning a generative layered representation of a scene from a video for motion segmentation. The learnt model is a composition of layers, which consist of one or more segments. Included in the model are the effects of image projection, lighting, and motion blur. The two main contributions of our method are: (i) a novel algorithm for obtaining the initial estimate of the model using efficient loopy belief propagation; and (ii) using /spl alpha//spl beta/-swap and /spl alpha/-expansion algorithms, which guarantee a strong local minima, for refining the initial estimate. Results are presented on several classes of objects with different types of camera motion. We compare our method with the state of the art and demonstrate significant improvements.

...read moreread less

Proceedings Article•DOI•

Identifying individuals in video by combining 'generative' and discriminative head models

[...]

Mark Everingham¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

17 Oct 2005

TL;DR: Two areas of innovation are described: the first is to capture the 3-D appearance of the entire head, rather than just the face region, so that visual features such as the hairline can be exploited, and the second is to combine discriminative and 'generative' approaches for detection and recognition.

...read moreread less

Abstract: The objective of this work is automatic detection and identification of individuals in unconstrained consumer video, given a minimal number of labelled faces as training data. Whilst much work has been done on (mainly frontal) face detection and recognition, current methods are not sufficiently robust to deal with the wide variations in pose and appearance found in such video. These include variations in scale, illumination, expression, partial occlusion, motion blur, etc. We describe two areas of innovation: the first is to capture the 3-D appearance of the entire head, rather than just the face region, so that visual features such as the hairline can be exploited. The second is to combine discriminative and 'generative' approaches for detection and recognition. Images rendered using the head model are used to train a discriminative tree-structured classifier giving efficient detection and pose estimates over a very wide pose range with three degrees of freedom. Subsequent verification of the identity is obtained using the head model in a 'generative' framework. We demonstrate excellent performance in detecting and identifying three characters and their poses in a TV situation comedy

...read moreread less

Proceedings Article•DOI•

Fast and Controllable 3D Modelling From Silhouettes

[...]

Mukta Prasad, Andrew Fitzgibbon, Andrew Zisserman

01 Jan 2005

TL;DR: It is shown how a 3D model of a complex curved object can be easily extracted from a single 2D image and found the smoothest 3D surface which projects exactly to this silhouette can be expressed as a quadratic optimization, a result which has not previously appeared in the large literature on the shape-from-silhouette problem.

...read moreread less

Abstract: We show how a 3D model of a complex curved object can be easily extracted from a single 2D image. A userdefined silhouette is the key input; and we show that finding the smoothest 3D surface which projects exactly to this silhouette can be expressed as a quadratic optimization, a result which has not previously appeared in the large literature on the shape-from-silhouette problem. For simple models, this process can immediately yield a usable 3D model; but for more complex geometries the user will wish to further shape the surface. We show that a variety of editing operations—which can be defined either in the image or in 3D—can also be expressed as linear constraints on the 3D shape parameters. We extend the system to fit higher genus surfaces. Our method has several advantages over the system of Zhanget al. [ZDPSS01] and over systems such asSKETCH and Teddy.

...read moreread less

Proceedings Article•DOI•

Tracking people and recognizing their activities

[...]

Deva Ramanan¹, David Forsyth¹, Andrew Zisserman²•Institutions (2)

University of California, Berkeley¹, University of Oxford²

20 Jun 2005

TL;DR: A system for automatic people tracking and activity recognition that builds a model of limb appearance from sparse stylized detections and reprocesses the video, using the learned appearance models to find people in unrestricted configuration.

...read moreread less

Abstract: We present a system for automatic people tracking and activity recognition Our basic approach to people-tracking is to build an appearance model for the person in the video The video illustrates our method of using a stylized-pose detector Our system builds a model of limb appearance from those sparse stylized detections Our algorithm then reprocesses the video, using the learned appearance models to find people in unrestricted configuration We can use our tracker to recover 3D configurations and activity labels We assume we have a motion capture library where the 3D poses have been labeled offline with activity descriptions

...read moreread less

Journal Article•DOI•

Automated detection and identification of persons in video using a coarse 3-D head model and multiple texture maps

[...]

Mark Everingham¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

24 Oct 2005

TL;DR: The identity of a target face can be determined by first proposing faces with similar pose, and then classifying the target face as one of the proposed faces or not, and the texture maps of the model can be automatically updated as new poses and expressions are detected.

...read moreread less

Abstract: Progress in the automatic detection and identification of humans in video, given a minimal number of labelled faces as training data, is described. This is an extremely challenging problem owing to the many sources of variation in a person's imaged appearance: pose variation, scale, facial expression, illumination, partial occlusion, motion blur, etc. The method developed in this work combines approaches from computer vision, for detection and pose estimation, with those from machine learning for classification. A ‘generative’ model of a person's head is defined consisting of a coarse 3-D model and multiple texture maps. This allows faces to be rendered with a variety of facial expressions and at poses differing from those of the training data. It is shown that the identity of a target face can then be determined by first proposing faces with similar pose, and then classifying the target face as one of the proposed faces or not. Furthermore, the texture maps of the model can be automatically updated as new poses and expressions are detected. Results of detecting three characters in a TV situation comedy are demonstrated.

...read moreread less

Digital Art History: A subject in transition

[...]

Antonio Criminisi, Martin Kemp, Andrew Zisserman

01 Jan 2005

TL;DR: In this paper, the authors explore the use of computer graphics and computer vision techniques in the history of art, focusing on analyzing the geometry of perspective paintings to learn about the perspectival skills of artists and explore the evolution of linear perspective in history.

...read moreread less

Abstract: This paper explores the use of computer graphics and computer vision techniques in the history of art. The focus is on analysing the geometry of perspective paintings to learn about the perspectival skills of artists and explore the evolution of linear perspective in history. Algorithms for a systematic analysis of the two- and three-dimensional geometry of paintings are drawn from the work on “single-view reconstruction” and applied to interpreting works of art from the Italian Renaissance and later periods. Since a perspectival painting is not a photograph of an actual subject but an artificial construction subject to imaginative manipulation and inadvertent inaccuracies, the internal consistency of its geometry must be assessed before carrying out any geometric analysis. Some simple techniques to analyse the consistency and perspectival accuracy of the geometry of a painting are discussed. Moreover, this work presents new algorithms for generating new views of a painted scene or portions of it, analysing shapes and proportions of objects, filling in occluded areas, performing a complete threedimensional reconstruction of a painting and a rigorous analysis of possible reconstruction ambiguities. The validity of the techniques described here is demonstrated on a number of historical paintings and frescoes. Whenever possible, the computer-generated results are compared to those obtained by art historians through careful manual analysis. This research represents a further attempt to build a constructive dialogue between two very different disciplines: computer science and history of art. Despite their fundamental differences, science and art can learn and be enriched by each other’s procedures. A longer and more detailed version of this paper may be found in [5].

...read moreread less