scispace - formally typeset
Search or ask a question
Author

Paul Over

Other affiliations: IBM
Bio: Paul Over is an academic researcher from National Institute of Standards and Technology. The author has contributed to research in topics: TRECVID & Automatic summarization. The author has an hindex of 34, co-authored 76 publications receiving 5381 citations. Previous affiliations of Paul Over include IBM.


Papers
More filters
Proceedings ArticleDOI
26 Oct 2006
TL;DR: An introduction to information retrieval (IR) evaluation from both a user and a system perspective is given, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out.
Abstract: The TREC Video Retrieval Evaluation (TRECVid)is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations 1 interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus,automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation bench-marking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress.

1,395 citations

01 Jan 2013
TL;DR: The TREC Video Retrieval Evaluation (TRECVID) 2012 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation as mentioned in this paper.
Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2012 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last ten years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST and other US government agencies. Many organizations and individuals worldwide contribute significant time and effort.

582 citations

Journal ArticleDOI
TL;DR: An overview of the TRECVid shot boundary detection task, a high-level overview ofThe most significant of the approaches taken, and a comparison of performances are presented, focussing on one year (2005) as an example.

319 citations

01 Nov 2014
TL;DR: The TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation as mentioned in this paper.
Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort.

312 citations

Book ChapterDOI
01 Jan 2009
TL;DR: This chapter summarizes the work done on the TRECVid high-level feature task, showing the progress made year-on-year, and provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task.
Abstract: Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one that determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically, however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarize the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video.

207 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations

Journal ArticleDOI
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

3,433 citations

Proceedings ArticleDOI
02 Sep 2015
TL;DR: The authors propose a local attention-based model that generates each word of the summary conditioned on the input sentence, which shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.
Abstract: Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.

2,339 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: The experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes, and assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes.
Abstract: We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

2,228 citations

Journal ArticleDOI
TL;DR: From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition and the usefulness of invariance is category-specific.
Abstract: Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from http://www.colordescriptors.com) in a structured way. The analytical invariance properties of color descriptors are explored, using a taxonomy based on invariance properties with respect to photometric transformations, and tested experimentally using a data set with known illumination conditions. In addition, the distinctiveness of color descriptors is assessed experimentally using two benchmarks, one from the image domain and one from the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results further reveal that, for light intensity shifts, the usefulness of invariance is category-specific. Overall, when choosing a single descriptor and no prior knowledge about the data set and object and scene categories is available, the OpponentSIFT is recommended. Furthermore, a combined set of color descriptors outperforms intensity-based SIFT and improves category recognition by 8 percent on the PASCAL VOC 2007 and by 7 percent on the Mediamill Challenge.

2,071 citations