Multimedia ontology learning for automatic annotation and video browsing

doi:10.1145/1460096.1460159

Home
/
Papers
/
Multimedia ontology learning for automatic annotation and video browsing

Proceedings Article•DOI•

Multimedia ontology learning for automatic annotation and video browsing

Anupama Mallik¹, Poornachander Pasumarthi¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

30 Oct 2008-pp 387-394

TL;DR: This work uses MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved.

read less

Abstract: In this work, we offer an approach to combine standard multimedia analysis techniques with knowledge drawn from conceptual metadata provided by domain experts of a specialized scholarly domain, to learn a domain-specific multimedia ontology from a set of annotated examples. A standard Bayesian network learning algorithm that learns structure and parameters of a Bayesian network is extended to include media observables in the learning. An expert group provides domain knowledge to construct a basic ontology of the domain as well as to annotate a set of training videos. These annotations help derive the associations between high-level semantic concepts of the domain and low-level MPEG-7 based features representing audio-visual content of the videos. We construct a more robust and refined version of this ontology by learning from this set of conceptually annotated videos. To encode this knowledge, we use MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved. We use the ontology specified knowledge for recognizing concepts relevant to a video to annotate fresh addition to the video database with relevant concepts in the ontology. These conceptual annotations are used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Nrityakosha: Preserving the intangible heritage of Indian classical dance

[...]

Anupama Mallik¹, Santanu Chaudhury¹, Hiranmay Ghosh²•Institutions (2)

Indian Institute of Technology Delhi¹, Tata Consultancy Services²

28 Dec 2011-ACM Journal on Computing and Cultural Heritage

TL;DR: The efficacy of the ontology-based approach is demonstrated by constructing an ontology for the cultural heritage domain of Indian classical dance, and a browsing application is developed for semantic access to the heritage collection of Indian dance videos.

...read moreread less

Abstract: Preservation of intangible cultural heritage, such as music and dance, requires encoding of background knowledge together with digitized records of the performances. We present an ontology-based approach for designing a cultural heritage repository for that purpose. Since dance and music are recorded in multimedia format, we use Multimedia Web Ontology Language (MOWL) to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with a labeled set of training data and use of the ontology to automatically annotate new instances of digital heritage artifacts. The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have demonstrated the efficacy of our approach by constructing an ontology for the cultural heritage domain of Indian classical dance, and have developed a browsing application for semantic access to the heritage collection of Indian dance videos.

...read moreread less

66 citations

Book Chapter•DOI•

Multi-Entity Bayesian Networks for Knowledge-Driven Analysis of ICH Content

[...]

Giannis Chantas, Alexandros Kitsikidis, Spiros Nikolopoulos, Kosmas Dimitropoulos, Stella Douka¹, Ioannis Kompatsiaris, Nikos Grammalidis - Show less +3 more•Institutions (1)

Aristotle University of Thessaloniki¹

06 Sep 2014

TL;DR: This paper introduces Multi-Entity Bayesian Networks (MEBNs) as the means to combine first-order logic with probabilistic inference and facilitate the semantic analysis of Intangible Cultural Heritage content.

...read moreread less

Abstract: In this paper we introduce Multi-Entity Bayesian Networks (MEBNs) as the means to combine first-order logic with probabilistic inference and facilitate the semantic analysis of Intangible Cultural Heritage (ICH) content. First, we mention the need to capture and maintain ICH manifestations for the safeguarding of cultural treasures. Second, we present the MEBN models and stress their key features that can be used as a powerful tool for the aforementioned cause. Third, we present the methodology followed to build a MEBN model for the analysis of a traditional dance. Finally, we compare the efficiency of our MEBN model with that of a simple Bayesian network and demonstrate its superiority in cases that demand for situation-specific treatment.

...read moreread less

9 citations

Cites methods from "Multimedia ontology learning for au..."

...In another closely related work [9], a semi-automatic ontology construction methodology is proposed for combining bayesian networks with probabilistic inference....
[...]

Book Chapter•DOI•

Using Concept Recognition to Annotate a Video Collection

[...]

Anupama Mallik¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

15 Dec 2009

TL;DR: A scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection to provide an effective video browsing interface to the user.

...read moreread less

Abstract: In this paper, we propose a scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection. The ontology for the domain is constructed from high-level knowledge of the domain lying with the domain experts, and further fine-tuned and refined by learning from multimedia data annotated by them. MOWL, a multimedia extension to OWL, is used to encode the concept to media-feature associations in the ontology as well as the uncertainties linked with observation of the perceptual multimedia data. Media feature classifiers help recognize low-level concepts in the videos, but the novelty of our work lies in discovery of high-level concepts in video content using the power of ontological relations between the concepts. This framework is used to provide rich, conceptual annotations to the video database, which can further be used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.

...read moreread less

7 citations

Cites background or methods from "Multimedia ontology learning for au..."

...3 Annotation Generation The input to our concept-recognition scheme is an initial multimedia ontology of the domain constructed with the help of domain knowledge provided by a group of domain experts, and fine-tuned by learning from the training set of annotated videos [3]....
[...]
...This approach to concept learning has been detailed in our earlier work [3]....
[...]

Proceedings Article•DOI•

Annotating Dance Posture Images Using Multi Kernel Feature Combination

[...]

Ehtesham Hassan¹, Santanu Chaudhury¹, M. Gopal¹•Institutions (1)

Indian Institute of Technology Delhi¹

15 Dec 2011

TL;DR: A novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL) and a novel feature representation which represents the local texture properties of the image is proposed.

...read moreread less

Abstract: We present a novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL). We have proposed a novel feature representation which represents the local texture properties of the image. The annotation model is defined in the direct a cyclic graph structure using the binary MKL algorithm. The bag-of-words model is applied for image representation. The experiments have been performed on the image collection belonging to two Indian classical dances (Bharatnatyam and Odissi). The annotation model has been tested using SIFT and the proposed feature individually and by optimally combining both the features. The experiments have shown promising results.

...read moreread less

7 citations

Cites methods from "Multimedia ontology learning for au..."

...images shown in figure I) as experimental dataset [6]....
[...]

Proceedings Article•DOI•

Preservation of intangible heritage: a case-study of indian classical dance

[...]

Anupama Mallik¹, Santanu Chaudhury¹, Hiranmay Ghosh²•Institutions (2)

Indian Institute of Technology Delhi¹, Tata Consultancy Services²

25 Oct 2010

TL;DR: This work presents an ontology based approach to capture and preserve the knowledge with digital heritage artefacts, and proposes the use of Multimedia Web Ontology (MOWL) that supports probabilistic reasoning with media properties of domain concepts, to encode the domain knowledge.

...read moreread less

Abstract: Cultural heritage is encoded in a variety of forms. The task of preserving heritage involves preserving the tangible and intangible resources that broadly define that heritage. A significant aspect of intangible heritage resources are performing arts which include classical dance and music. Digital heritage resources include heritage artefacts in digitized form as well as the background knowledge that puts them in perspective. We present an ontology based approach to capture and preserve the knowledge with digital heritage artefacts. Since the artefacts are generally preserved in multimedia format, we propose the use of Multimedia Web Ontology (MOWL) that supports probabilistic reasoning with media properties of domain concepts, to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with a labelled set of training data and use of the ontology to automatically annotate new instances of digital heritage artefacts. The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have realized a proof of concept in the domain of Indian Classical Dance and present some results.

...read moreread less

6 citations

Cites background from "Multimedia ontology learning for au..."

...Figure 1 depicts the Figure 1: Architecture for Ontology based Navigation of an eHeritage Digital Collection architecture of our ontological framework....
[...]

1
2
3
4
…

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems

[...]

Milind Naphade¹, Trausti Kristjansson¹, Brendan J. Frey¹, Thomas S. Huang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

04 Oct 1998

TL;DR: A framework, in which scenes can be indexed at a semantic level, and a probabilistic framework is envisioned to encode the higher level relationship between multijects, which enhances or reduces the probabilities of concurrent existence of variousmultijects.

...read moreread less

Abstract: This paper proposes a novel scheme for bridging the gap between low level media features and high level semantics using a probabilistic framework. We propose a framework, in which scenes can be indexed at a semantic level. The fundamental components of the framework are sites, objects and events. Detection of presence of an instance of one of these influences the probability of the presence of instances within other classes. Detection of instances is done using probabilistic multimedia objects: multijects. Indexing using multijects can handle queries posed at semantic level. multijects are built in a Markovian framework. Two ways of building the multijects from low level features fusing features from multiple modalities are presented. A probabilistic framework is also envisioned to encode the higher level relationship between multijects, which enhances or reduces the probabilities of concurrent existence of various multijects. An actual implementation is presented by developing multijects representing the higher level concept of "explosion" and "waterfall". The models are evaluated by using the multijects to detect explosions and waterfalls in movies. Results reveal, that the multijects detect the aforementioned events with greater accuracy and are able to segment the video into scenes which have explosions and waterfalls.

...read moreread less

202 citations

Posted Content•

On the Sample Complexity of Learning Bayesian Networks

[...]

Nir Friedman¹, Zohar Yakhini¹•Institutions (1)

Stanford University¹

13 Feb 2013-arXiv: Learning

TL;DR: In this paper, the authors examined the sample complexity of MDL based learning procedures for Bayesian networks and showed that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon) log( 1/delta)loglog (1/Delta)).

...read moreread less

Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon)log(1/delta)loglog (1/delta)). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

...read moreread less

125 citations

Proceedings Article•

On the sample complexity of learning Bayesian networks

[...]

Nir Friedman¹, Zohar Yakhini¹•Institutions (1)

Stanford University¹

01 Aug 1996

TL;DR: The sample complexity of MDL based learning procedures for Bayesian networks is examined and the number of samples needed to learn an e-close approximation with confidence δ is shown, which means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound.

...read moreread less

Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an e-close approximation (in terms of entropy distance) with confidence δ is O ((1/e)4/3 log 1/e log 1/δ log log 1/δ). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

...read moreread less

113 citations

Proceedings Article•DOI•

Full Bayesian network classifiers

[...]

Jiang Su¹, Harry Zhang¹•Institutions (1)

University of New Brunswick¹

25 Jun 2006

TL;DR: This paper presents a novel, efficient decision tree learning, which is also effective in the context of FBC learning and demonstrates better performance in both classification and ranking compared with other state-of-the-art learning algorithms.

...read moreread less

Abstract: The structure of a Bayesian network (BN) encodes variable independence. Learning the structure of a BN, however, is typically of high computational complexity. In this paper, we explore and represent variable independence in learning conditional probability tables (CPTs), instead of in learning structure. A full Bayesian network is used as the structure and a decision tree is learned for each CPT. The resulting model is called full Bayesian network classifiers (FBCs). In learning an FBC, learning the decision trees for CPTs captures essentially both variable independence and context-specific independence. We present a novel, efficient decision tree learning, which is also effective in the context of FBC learning. In our experiments, the FBC learning algorithm demonstrates better performance in both classification and ranking compared with other state-of-the-art learning algorithms. In addition, its reduced effort on structure learning makes its time complexity quite low as well.

...read moreread less

89 citations

Proceedings Article•DOI•

Joint visual-text modeling for automatic retrieval of multimedia documents

[...]

G. Iyengar¹, Pinar Duygulu², Shaolei Feng³, Pavel Ircing⁴, Sanjeev Khudanpur⁵, Dietrich Klakow⁶, Matthew R. Krause⁷, R. Manmatha³, H. J. Nock¹, D. Petkova⁸, Brock Pytlik⁵, Paola Virga⁵ - Show less +8 more•Institutions (8)

IBM¹, Bilkent University², University of Massachusetts Amherst³, University College West⁴, Johns Hopkins University⁵, Saarland University⁶, Georgetown University⁷, Mount Holyoke College⁸

06 Nov 2005

TL;DR: A novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework is proposed, which demonstrates over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

...read moreread less

Abstract: In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

...read moreread less

50 citations