scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Multimedia ontology learning for automatic annotation and video browsing

TL;DR: This work uses MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved.
Abstract: In this work, we offer an approach to combine standard multimedia analysis techniques with knowledge drawn from conceptual metadata provided by domain experts of a specialized scholarly domain, to learn a domain-specific multimedia ontology from a set of annotated examples. A standard Bayesian network learning algorithm that learns structure and parameters of a Bayesian network is extended to include media observables in the learning. An expert group provides domain knowledge to construct a basic ontology of the domain as well as to annotate a set of training videos. These annotations help derive the associations between high-level semantic concepts of the domain and low-level MPEG-7 based features representing audio-visual content of the videos. We construct a more robust and refined version of this ontology by learning from this set of conceptually annotated videos. To encode this knowledge, we use MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved. We use the ontology specified knowledge for recognizing concepts relevant to a video to annotate fresh addition to the video database with relevant concepts in the ontology. These conceptual annotations are used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.
Citations
More filters
Journal ArticleDOI
TL;DR: The efficacy of the ontology-based approach is demonstrated by constructing an ontology for the cultural heritage domain of Indian classical dance, and a browsing application is developed for semantic access to the heritage collection of Indian dance videos.
Abstract: Preservation of intangible cultural heritage, such as music and dance, requires encoding of background knowledge together with digitized records of the performances. We present an ontology-based approach for designing a cultural heritage repository for that purpose. Since dance and music are recorded in multimedia format, we use Multimedia Web Ontology Language (MOWL) to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with a labeled set of training data and use of the ontology to automatically annotate new instances of digital heritage artifacts. The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have demonstrated the efficacy of our approach by constructing an ontology for the cultural heritage domain of Indian classical dance, and have developed a browsing application for semantic access to the heritage collection of Indian dance videos.

66 citations

Book ChapterDOI
06 Sep 2014
TL;DR: This paper introduces Multi-Entity Bayesian Networks (MEBNs) as the means to combine first-order logic with probabilistic inference and facilitate the semantic analysis of Intangible Cultural Heritage content.
Abstract: In this paper we introduce Multi-Entity Bayesian Networks (MEBNs) as the means to combine first-order logic with probabilistic inference and facilitate the semantic analysis of Intangible Cultural Heritage (ICH) content. First, we mention the need to capture and maintain ICH manifestations for the safeguarding of cultural treasures. Second, we present the MEBN models and stress their key features that can be used as a powerful tool for the aforementioned cause. Third, we present the methodology followed to build a MEBN model for the analysis of a traditional dance. Finally, we compare the efficiency of our MEBN model with that of a simple Bayesian network and demonstrate its superiority in cases that demand for situation-specific treatment.

9 citations


Cites methods from "Multimedia ontology learning for au..."

  • ...In another closely related work [9], a semi-automatic ontology construction methodology is proposed for combining bayesian networks with probabilistic inference....

    [...]

Book ChapterDOI
15 Dec 2009
TL;DR: A scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection to provide an effective video browsing interface to the user.
Abstract: In this paper, we propose a scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection. The ontology for the domain is constructed from high-level knowledge of the domain lying with the domain experts, and further fine-tuned and refined by learning from multimedia data annotated by them. MOWL, a multimedia extension to OWL, is used to encode the concept to media-feature associations in the ontology as well as the uncertainties linked with observation of the perceptual multimedia data. Media feature classifiers help recognize low-level concepts in the videos, but the novelty of our work lies in discovery of high-level concepts in video content using the power of ontological relations between the concepts. This framework is used to provide rich, conceptual annotations to the video database, which can further be used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.

7 citations


Cites background or methods from "Multimedia ontology learning for au..."

  • ...3 Annotation Generation The input to our concept-recognition scheme is an initial multimedia ontology of the domain constructed with the help of domain knowledge provided by a group of domain experts, and fine-tuned by learning from the training set of annotated videos [3]....

    [...]

  • ...This approach to concept learning has been detailed in our earlier work [3]....

    [...]

Proceedings ArticleDOI
15 Dec 2011
TL;DR: A novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL) and a novel feature representation which represents the local texture properties of the image is proposed.
Abstract: We present a novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL). We have proposed a novel feature representation which represents the local texture properties of the image. The annotation model is defined in the direct a cyclic graph structure using the binary MKL algorithm. The bag-of-words model is applied for image representation. The experiments have been performed on the image collection belonging to two Indian classical dances (Bharatnatyam and Odissi). The annotation model has been tested using SIFT and the proposed feature individually and by optimally combining both the features. The experiments have shown promising results.

7 citations


Cites methods from "Multimedia ontology learning for au..."

  • ...images shown in figure I) as experimental dataset [6]....

    [...]

Proceedings ArticleDOI
25 Oct 2010
TL;DR: This work presents an ontology based approach to capture and preserve the knowledge with digital heritage artefacts, and proposes the use of Multimedia Web Ontology (MOWL) that supports probabilistic reasoning with media properties of domain concepts, to encode the domain knowledge.
Abstract: Cultural heritage is encoded in a variety of forms. The task of preserving heritage involves preserving the tangible and intangible resources that broadly define that heritage. A significant aspect of intangible heritage resources are performing arts which include classical dance and music. Digital heritage resources include heritage artefacts in digitized form as well as the background knowledge that puts them in perspective. We present an ontology based approach to capture and preserve the knowledge with digital heritage artefacts. Since the artefacts are generally preserved in multimedia format, we propose the use of Multimedia Web Ontology (MOWL) that supports probabilistic reasoning with media properties of domain concepts, to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with a labelled set of training data and use of the ontology to automatically annotate new instances of digital heritage artefacts. The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have realized a proof of concept in the domain of Indian Classical Dance and present some results.

6 citations


Cites background from "Multimedia ontology learning for au..."

  • ...Figure 1 depicts the Figure 1: Architecture for Ontology based Navigation of an eHeritage Digital Collection architecture of our ontological framework....

    [...]

References
More filters
Proceedings ArticleDOI
04 Oct 1998
TL;DR: A framework, in which scenes can be indexed at a semantic level, and a probabilistic framework is envisioned to encode the higher level relationship between multijects, which enhances or reduces the probabilities of concurrent existence of variousmultijects.
Abstract: This paper proposes a novel scheme for bridging the gap between low level media features and high level semantics using a probabilistic framework. We propose a framework, in which scenes can be indexed at a semantic level. The fundamental components of the framework are sites, objects and events. Detection of presence of an instance of one of these influences the probability of the presence of instances within other classes. Detection of instances is done using probabilistic multimedia objects: multijects. Indexing using multijects can handle queries posed at semantic level. multijects are built in a Markovian framework. Two ways of building the multijects from low level features fusing features from multiple modalities are presented. A probabilistic framework is also envisioned to encode the higher level relationship between multijects, which enhances or reduces the probabilities of concurrent existence of various multijects. An actual implementation is presented by developing multijects representing the higher level concept of "explosion" and "waterfall". The models are evaluated by using the multijects to detect explosions and waterfalls in movies. Results reveal, that the multijects detect the aforementioned events with greater accuracy and are able to segment the video into scenes which have explosions and waterfalls.

202 citations

Posted Content
TL;DR: In this paper, the authors examined the sample complexity of MDL based learning procedures for Bayesian networks and showed that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon) log( 1/delta)loglog (1/Delta)).
Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon)log(1/delta)loglog (1/delta)). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

125 citations

Proceedings Article
01 Aug 1996
TL;DR: The sample complexity of MDL based learning procedures for Bayesian networks is examined and the number of samples needed to learn an e-close approximation with confidence δ is shown, which means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound.
Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an e-close approximation (in terms of entropy distance) with confidence δ is O ((1/e)4/3 log 1/e log 1/δ log log 1/δ). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

113 citations

Proceedings ArticleDOI
25 Jun 2006
TL;DR: This paper presents a novel, efficient decision tree learning, which is also effective in the context of FBC learning and demonstrates better performance in both classification and ranking compared with other state-of-the-art learning algorithms.
Abstract: The structure of a Bayesian network (BN) encodes variable independence. Learning the structure of a BN, however, is typically of high computational complexity. In this paper, we explore and represent variable independence in learning conditional probability tables (CPTs), instead of in learning structure. A full Bayesian network is used as the structure and a decision tree is learned for each CPT. The resulting model is called full Bayesian network classifiers (FBCs). In learning an FBC, learning the decision trees for CPTs captures essentially both variable independence and context-specific independence. We present a novel, efficient decision tree learning, which is also effective in the context of FBC learning. In our experiments, the FBC learning algorithm demonstrates better performance in both classification and ranking compared with other state-of-the-art learning algorithms. In addition, its reduced effort on structure learning makes its time complexity quite low as well.

89 citations

Proceedings ArticleDOI
06 Nov 2005
TL;DR: A novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework is proposed, which demonstrates over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.
Abstract: In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

50 citations