scispace - formally typeset
Search or ask a question
Author

Daniel Keysers

Bio: Daniel Keysers is an academic researcher from Google. The author has contributed to research in topics: Image retrieval & Word error rate. The author has an hindex of 41, co-authored 150 publications receiving 6297 citations. Previous affiliations of Daniel Keysers include German Research Centre for Artificial Intelligence & RWTH Aachen University.


Papers
More filters
Journal ArticleDOI
TL;DR: An experimental comparison of a large number of different image descriptors for content-based image retrieval is presented and the often used, but very simple, color histogram performs well in the comparison and thus can be recommended as a simple baseline for many applications.
Abstract: An experimental comparison of a large number of different image descriptors for content-based image retrieval is presented. Many of the papers describing new techniques and descriptors for content-based image retrieval describe their newly proposed methods as most appropriate without giving an in-depth comparison with all methods that were proposed earlier. In this paper, we first give an overview of a large variety of features for content-based image retrieval and compare them quantitatively on four different tasks: stock photo retrieval, personal photo collection retrieval, building retrieval, and medical image retrieval. For the experiments, five different, publicly available image databases are used and the retrieval performance of the features is analyzed in detail. This allows for a direct comparison of all features considered in this work and furthermore will allow a comparison of newly proposed features to these in the future. Additionally, the correlation of the features is analyzed, which opens the way for a simple and intuitive method to find an initial set of suitable features for a new task. The article concludes with recommendations which features perform well for what type of data. Interestingly, the often used, but very simple, color histogram performs well in the comparison and thus can be recommended as a simple baseline for many applications.

641 citations

Book ChapterDOI
11 Apr 2005
TL;DR: The PASCAL Visual Object Classes Challenge (PASCALVOC) as mentioned in this paper was held from February to March 2005 to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects).
Abstract: The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

381 citations

Journal ArticleDOI
TL;DR: The proposed architecture is suitable for content-based image retrieval in medical applications and improves current picture archiving and communication systems that still rely on alphanumerical descriptions, which are insufficient for image retrieval of high recall and precision.
Abstract: Objectives: To develop a general structure for semantic image analysis that is suitable for content-based image retrieval in medical applications and an architecture for its efficient implementation. Methods: Stepwise content analysis of medical images results in six layers of information modeling incorporating medical expert knowledge (raw data layer, registered data layer, feature layer, scheme layer, object layer, knowledge layer). A reference database with 10,000 images categorized according to the image modality, orientation, body region, and biological system is used. By means of prototypes in each category, identification of objects and their geometrical or temporal relationships are handled in the object and the knowledge layer, respectively. A distributed system designed with only three core elements is implemented: (i) the central database holds program sources, processing scheme descriptions, images, features, and administrative information about the workstation cluster; (ii) the scheduler balances distributed computing; and (iii) the web server provides graphical user interfaces for data entry and retrieval, which can be easily adapted to a variety of applications for content-based image retrieval in medicine. Results: Leaving-one-out experiments were distributed by the scheduler and controlled via corresponding job lists offering transparency regarding the viewpoints of a distributed system and the user. The proposed architecture is suitable for content-based image retrieval in medical applications. It improves current picture archiving and communication systems that still rely on alphanumerical descriptions, which are insufficient for image retrieval of high recall and precision.

328 citations

Proceedings ArticleDOI
27 Jan 2008
TL;DR: A fast adaptive binarization algorithm that yields the same quality of Binarization as the Sauvola method but runs in time close to that of global thresholding methods (like Otsu's method), independent of the window size.
Abstract: Adaptive binarization is an important first step in many document analysis and OCR processes. This paper describes a fast adaptive binarization algorithm that yields the same quality of binarization as the Sauvola method,1 but runs in time close to that of global thresholding methods (like Otsu's method2), independent of the window size. The algorithm combines the statistical constraints of Sauvola's method with integral images.3 Testing on the UW-1 dataset demonstrates a 20-fold speedup compared to the original Sauvola algorithm.

317 citations

Journal ArticleDOI
TL;DR: It is shown experimentally that the proposed nonlinear image deformation models performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity.
Abstract: We present the application of different nonlinear image deformation models to the task of image recognition The deformation models are especially suited for local changes as they often occur in the presence of image object variability We show that, among the discussed models, there is one approach that combines simplicity of implementation, low-computational complexity, and highly competitive performance across various real-world image recognition tasks We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity In particular, an error rate of 054 percent on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 126 percent, in the 2005 international ImageCLEF evaluation of medical image specifically categorization

257 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations

01 Jan 2002

9,314 citations

01 Jan 2011
TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.
Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

5,311 citations

01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

3,627 citations

Journal ArticleDOI
TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.
Abstract: We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

3,501 citations