scispace - formally typeset
Search or ask a question
Author

Ville Viitaniemi

Bio: Ville Viitaniemi is an academic researcher from Aalto University. The author has contributed to research in topics: Image retrieval & TRECVID. The author has an hindex of 12, co-authored 51 publications receiving 762 citations. Previous affiliations of Ville Viitaniemi include Helsinki University of Technology.

Papers published on a yearly basis

Papers
More filters
Book ChapterDOI
11 Apr 2005
TL;DR: The PASCAL Visual Object Classes Challenge (PASCALVOC) as mentioned in this paper was held from February to March 2005 to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects).
Abstract: The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

381 citations

Proceedings ArticleDOI
08 Jul 2009
TL;DR: The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account and compare two ways for tiling images geometrically: soft tiling approach---proposed here---and the traditional hard tiling technique.
Abstract: The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach---proposed here---and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel.The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi-mask fusion is necessary for good category detection performance in case of hard tiling. The evaluated fusion mechanisms performed approximately equally well.

39 citations

Book ChapterDOI
11 Sep 2008
TL;DR: The techniques which the method used to participate in the PASCAL NoE VOC Challenge 2007 image analysis performance evaluation campaign produced comparatively good performance, and the method's segmentation accuracy was the best of all submissions.
Abstract: In this paper we outline the techniques which we used to participate in the PASCAL NoE VOC Challenge 2007 image analysis performance evaluation campaign. We took part in three of the image analysis competitions: image classification, object detection and object segmentation. In the classification task of the evaluation our method produced comparatively good performance, the 4th best of 19 submissions. In contrast, our detection results were quite modest. Our method's segmentation accuracy was the best of all submissions. Our approach for the classification task is based on fused classifications by numerous global image features, including histograms of local features. The object detection combines similar classification of automatically extracted image segments and the previously obtained scene type classifications. The object segmentations are obtained in a straightforward fashion from the detection results.

34 citations

Book ChapterDOI
10 Sep 2006
TL;DR: In this article, the interaction between different semantic levels in still image scene classification and object detection problems is considered, where a neural method is used to produce a tentative higher-level semantic scene representation from low-level statistical visual features in a bottom-up fashion, which is then used to refine the lower-level object detection results.
Abstract: In this paper we consider the interaction between different semantic levels in still image scene classification and object detection problems We present a method where a neural method is used to produce a tentative higher-level semantic scene representation from low-level statistical visual features in a bottom-up fashion This emergent representation is then used to refine the lower-level object detection results We evaluate the proposed method with data from Pascal VOC Challenge 2006 image classification and object detection competition The proposed techniques for exploiting global classification results are found to significantly improve the accuracy of local object detection

30 citations

Journal ArticleDOI
TL;DR: This work considers two traditional metrics for evaluating performance in automatic image annotation, the normalised score (NS) and the precision/recall (PR) statistics, particularly in connection with a de facto standard 5000 Corel image benchmark annotation task.
Abstract: In this work we consider two traditional metrics for evaluating performance in automatic image annotation, the normalised score (NS) and the precision/recall (PR) statistics, particularly in connection with a de facto standard 5000 Corel image benchmark annotation task. We also motivate and describe another performance measure, de-symmetrised termwise mutual information (DTMI), as a principled compromise between the two traditional extremes. In addition to discussing the measures theoretically, we correlate them experimentally for a family of annotation system configurations derived from the PicSOM image content analysis framework. Looking at the obtained performance figures, we notice that such kind of a system, based on adaptive fusion of numerous global image features, clearly outperforms the considered methods in literature.

27 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations

Journal ArticleDOI
TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.
Abstract: We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

3,501 citations

01 Jan 2006

3,012 citations

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
Abstract: Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.

2,237 citations

Journal ArticleDOI
17 Jun 2006
TL;DR: A large-scale evaluation of an approach that represents images as distributions of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ2 distance.
Abstract: Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the ÷2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several state-of-the-art recognition methods on 4 texture and 5 object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance.

1,863 citations