scispace - formally typeset
Search or ask a question

Showing papers by "Alexander C. Berg published in 2004"


Proceedings ArticleDOI
27 Jun 2004
TL;DR: It is shown quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images, obtained by applying a face finder to approximately half a million captioned news images.
Abstract: We show quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images. Our dataset is 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured "in the wild" in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Each face image is associated with a set of names, automatically extracted from the associated caption. Many, but not all such sets contain the correct name. We cluster face images in appropriate discriminant coordinates. We use a clustering procedure to break ambiguities in labelling and identify incorrectly labelled faces. A merging procedure then identifies variants of names that refer to the same individual. The resulting representation can be used to label faces in news images or to organize news pictures by individuals present. An alternative view of our procedure is as a process that cleans up noisy supervised data. We demonstrate how to use entropy measures to evaluate such procedures.

392 citations


Proceedings Article
01 Dec 2004
TL;DR: This work obtains 44,773 face images, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer, with these faces and improves results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context.
Abstract: The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer, with these faces. A simple clustering method can produce fair results. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation.

129 citations


Proceedings Article
01 Jan 2004

34 citations


01 Jan 2004
TL;DR: It is shown that a large and realistic face dataset can be built from news photographs and their associated captions, and an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation are shown.
Abstract: We show that a large and realistic face dataset can be built from news photographs and their associated captions. Our dataset consists of 44,773 face images, obtained by applying a face nder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured iin the wildi in a variety of congurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Faces are extracted from the images and names from the associated caption. Our system uses a clustering procedure to nd the correspondence between faces and associated names in news picture-caption pairs. The context in which a name appears in a caption provides powerful cues as to whether it is depicted in the associated image. By incorporating simple natural language techniques, we are able to improve our name assignment signicantly . Once the procedure is complete, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation.

28 citations