scispace - formally typeset
Open AccessPosted Content

Towards an All-Purpose Content-Based Multimedia Information Retrieval System

Reads0
Chats0
TLDR
The full vitrivr stack is unique in that it is the first multimedia retrieval system that seamlessly integrates support for four different types of media, and paves the way towards an all-purpose, content-based multimedia information retrieval system.
Abstract
The growth of multimedia collections - in terms of size, heterogeneity, and variety of media types - necessitates systems that are able to conjointly deal with several forms of media, especially when it comes to searching for particular objects. However, existing retrieval systems are organized in silos and treat different media types separately. As a consequence, retrieval across media types is either not supported at all or subject to major limitations. In this paper, we present vitrivr, a content-based multimedia information retrieval stack. As opposed to the keyword search approach implemented by most media management systems, vitrivr makes direct use of the object's content to facilitate different types of similarity search, such as Query-by-Example or Query-by-Sketch, for and, most importantly, across different media types - namely, images, audio, videos, and 3D models. Furthermore, we introduce a new web-based user interface that enables easy-to-use, multimodal retrieval from and browsing in mixed media collections. The effectiveness of vitrivr is shown on the basis of a user study that involves different query and media types. To the best of our knowledge, the full vitrivr stack is unique in that it is the first multimedia retrieval system that seamlessly integrates support for four different types of media. As such, it paves the way towards an all-purpose, content-based multimedia information retrieval system.

read more

Citations
More filters
Proceedings ArticleDOI

Retrieval of Structured and Unstructured Data with vitrivr

TL;DR: The vitrivr open-source multimedia retrieval stack as mentioned in this paper has been extended with the capability to process Boolean query expressions alongside content-based query descriptions in order to leverage the structural diversity inherent to lifelog data.
Proceedings ArticleDOI

Multimodal Multimedia Retrieval with vitrivr

TL;DR: Vitrivr is presented, a general-purpose content-based multimedia retrieval stack that seamlessly integrates support for four different types of media, namely images, audio, videos, and 3D models.

Melody extraction from polyphonic music signal susing pitch contour characteristics

TL;DR: In this article, the authors presented a novel system for the automatic extraction of the main melody from polyphonic music recordings based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues.
Journal ArticleDOI

A Multimodal End-to-End Deep Learning Architecture for Music Popularity Prediction

TL;DR: The creation of SpotGenTrack Popularity Dataset (SPD) is presented as an alternative solution to existing datasets that will facilitate researchers when comparing and promoting their models and an innovative multimodal end-to-end Deep Learning architecture named as HitMusicNet is presented for predicting popularity in music recordings.
Proceedings ArticleDOI

Multi-Stage Queries and Temporal Scoring in Vitrivr

TL;DR: This paper presents two extensions made to the retrieval model of the open-source content-based multimedia retrieval stack vitrivr which enable a user to formulate more precise queries which can be evaluated in a staged manner, thereby improving the result quality without sacrificing the system’s overall flexibility.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Journal ArticleDOI

Speeded-Up Robust Features (SURF)

TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Journal ArticleDOI

On the use of windows for harmonic analysis with the discrete Fourier transform

F.J. Harris
TL;DR: A comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared is included, and an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.
Journal ArticleDOI

Content-based image retrieval at the end of the early years

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.