scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Combining multiple evidence for video classification

TL;DR: The efficacy of the performance based fusion method is demonstrated by applying it to classification of short video clips into six popular TV broadcast genre, namely cartoon, commercial, news, cricket, football, and tennis.
Abstract: In this paper, we investigate the problem of video classification into predefined genre, by combining the evidence from multiple classifiers. It is well known in the pattern recognition community that the accuracy of classification obtained by combining decisions made by independent classifiers can be substantially higher than the accuracy of the individual classifiers. The conventional method for combining individual classifiers weighs each classifier equally (sum or vote rule fusion). In this paper, we study a method that estimates the performances of the individual classifiers and combines the individual classifiers by weighing them according to their estimated performance. We demonstrate the efficacy of the performance based fusion method by applying it to classification of short video clips (20 seconds) into six popular TV broadcast genre, namely cartoon, commercial, news, cricket, football, and tennis. The individual classifiers are trained using different spatial and temporal features derived from the video sequences, and two different classifier methodologies, namely hidden Markov models (HMMs) and support vector machines (SVMs). The experiments were carried out on more than 3 hours of video data. A classification rate of 93.12% for all the six classes and 97.14% for sports category alone has been achieved, which is significantly higher than the performance of the individual classifiers.
Citations
More filters
Journal ArticleDOI
TL;DR: This article proposes in this article a methodology for classifying the genre of television programmes, which reaches a classification accuracy rate of 95% and is used for training a parallel neural network system able to distinguish between seven video genres.
Abstract: Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.

60 citations

Journal ArticleDOI
TL;DR: The objective of video data mining is to discover and describe interesting patterns from the huge amount ofVideo data as it is one of the core problem areas of the data-mining research community.
Abstract: Data mining is a process of extracting previously unknown knowledge and detecting the interesting patterns from a massive set of data. Thanks to the extensive use of information technology and the recent developments in multimedia systems, the amount of multimedia data available to users has increased exponentially. Video is an example of multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio. It is widely used in many major potential applications like security and surveillance, entertainment, medicine, education programs and sports. The objective of video data mining is to discover and describe interesting patterns from the huge amount of video data as it is one of the core problem areas of the data-mining research community. Compared to the mining of other types of data, video data mining is still in its infancy. There are many challenging research problems existing with video mining. Beginning with an overview of the video data-mining literature, this paper concludes with the applications of video mining.

51 citations


Cites background from "Combining multiple evidence for vid..."

  • ...A feature is defined as a descriptive parameter extracted from an image or a video stream [92]....

    [...]

Journal ArticleDOI
TL;DR: This work extracts domain knowledge about sport events recorded by multiple users, by classifying the sport type into soccer, American football, basketball, tennis, ice-hockey, or volleyball, by using a multi-user and multimodal approach.
Abstract: The recent proliferation of mobile video content has emphasized the need for applications such as automatic organization and automatic editing of videos. These applications could greatly benefit from domain knowledge about the content. However, extracting semantic information from mobile videos is a challenging task, due to their unconstrained nature. We extract domain knowledge about sport events recorded by multiple users, by classifying the sport type into soccer, American football, basketball, tennis, ice-hockey, or volleyball. We adopt a multi-user and multimodal approach, where each user simultaneously captures audio-visual content and auxiliary sensor data (from magnetometers and accelerometers). Firstly, each modality is separately analyzed; then, analysis results are fused for obtaining the sport type. The auxiliary sensor data is used for extracting more discriminative spatio-temporal visual features and efficient camera motion features. The contribution of each modality to the fusion process is adapted according to the quality of the input data. We performed extensive experiments on data collected at public sport events, showing the merits of using different combinations of modalities and fusion methods. The results indicate that analyzing multimodal and multi-user data, coupled with adaptive fusion, improves classification accuracies in most tested cases, up to 95.45%.

31 citations


Cites background from "Combining multiple evidence for vid..."

  • ...In [13], the authors successfully applied late fusion for discriminating among six TV broadcast genres and sub-genres: cartoon, commercial, news, cricket, football, and tennis....

    [...]

Proceedings ArticleDOI
13 Dec 2007
TL;DR: This paper inspects the problem of automatic video classification using static and dynamic features using hidden Markov model (HMM) as the classifier and demonstrates the efficiency of the system by applying it on a broad range of video data.
Abstract: Automatic classification of video content is receiving increased impact in the multimedia information processing. This paper inspects the problem of automatic video classification using static and dynamic features. Five different genres such as cartoon, sports, commercials, news and TV serial are studied for assessment. The approach exploits edge information and color histogram as static features and motion information as the dynamic feature with hidden Markov model (HMM) as the classifier. The results are evaluated by constructing individual HMM for each of the features and finally the results obtained are combined to assess the output genre. The method demonstrates the efficiency of the system by applying it on a broad range of video data: 3 hours of video is used for training purpose and a further 1 hour of video as test set. Overall classification accuracy of 95.6% is accomplished.

25 citations


Cites methods from "Combining multiple evidence for vid..."

  • ...The method demonstrates the efficiency of the system by applying it on a broad range of video data: 3 hours of video is used for training purpose and a further 1 hour of video as test set....

    [...]

Book ChapterDOI
01 Jan 2010
TL;DR: Experimental results show good performance of the scheme of video scene detection of a given sport discipline in TV sports news, using the Automatic Video Indexer.
Abstract: Similarly to text, video is hierarchically structured. The analogies of text and video structures are discussed. Then the juxtaposition is presented of two indexing processes, i.e. of text and video indexing based on the content analysis of their structure units. Several frameworks of automatic detection and categorisation of video shots and scenes reporting the sport events in a given discipline in TV sports news have already been proposed. It has been observed that many sport videos such as archery, diving, soccer, and tennis have repetitive structure patterns. In the tests performed using the Automatic Video Indexer AVI shots and then scenes have been detected in tested TV news videos. Experimental results show good performance of the scheme of video scene detection of a given sport discipline in TV sports news. The Automatic Video Indexer is a research project investigating tools and techniques of automatic video indexing for retrieval systems.

24 citations

References
More filters
Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations

Journal ArticleDOI
TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

4,546 citations

Journal ArticleDOI
01 May 1992
TL;DR: On applying these methods to combine several classifiers for recognizing totally unconstrained handwritten numerals, the experimental results show that the performance of individual classifiers can be improved significantly.
Abstract: Possible solutions to the problem of combining classifiers can be divided into three categories according to the levels of information available from the various classifiers. Four approaches based on different methodologies are proposed for solving this problem. One is suitable for combining individual classifiers such as Bayesian, k-nearest-neighbor, and various distance classifiers. The other three could be used for combining any kind of individual classifiers. On applying these methods to combine several classifiers for recognizing totally unconstrained handwritten numerals, the experimental results show that the performance of individual classifiers can be improved significantly. For example, on the US zipcode database, 98.9% recognition with 0.90% substitution and 0.2% rejection can be obtained, as well as high reliability with 95% recognition, 0% substitution, and 5% rejection. >

2,389 citations

Proceedings ArticleDOI
01 Feb 1997
TL;DR: It is shown that CCV’s can give superior results to color histogram-based methods for comparing images that incorporates spatial information, and to whom correspondence should be addressed tograms for image retrieval.
Abstract: Color histograms are used to compare images in many applications. Their advantages are efficiency, and insensitivity to small changes in camera viewpoint. However, color histograms lack spatial information, so images with very different appearances can have similar histograms. For example, a picture of fall foliage might contain a large number of scattered red pixels; this could have a similar color histogram to a picture with a single large red object. We describe a histogram-based method for comparing images that incorporates spatial information. We classify each pixel in a given color bucket as either coherent or incoherent, based on whether or not it is part of a large similarly-colored region. A color coherence vector (CCV) stores the number of coherent versus incoherent pixels with each color. By separating coherent pixels from incoherent pixels, CCV’s provide finer distinctions than color histograms. CCV’s can be computed at over 5 images per second on a standard workstation. A database with 15,000 images can be queried for the images with the most similar CCV’s in under 2 seconds. We show that CCV’s can give superior results to color his∗To whom correspondence should be addressed tograms for image retrieval.

931 citations

01 Jan 2000
TL;DR: In this paper, learning reference EPFL-REPORT-82604 is used to learn Reference EPFL this paper. But learning reference is not considered in this paper. http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10
Abstract: Keywords: learning Reference EPFL-REPORT-82604 URL: http://publications.idiap.ch/downloads/reports/2000/rr00-17.pdf Record created on 2006-03-10, modified on 2017-05-10

904 citations