Bio: Thomas Sikora is an academic researcher from Technical University of Berlin. The author has contributed to research in topics: Motion estimation & Motion compensation. The author has an hindex of 40, co-authored 333 publications receiving 9941 citations. Previous affiliations of Thomas Sikora include Free University of Berlin & Ghent University.
Papers published on a yearly basis
01 Jun 2002
TL;DR: This book has been designed as a unique tutorial in the new MPEG 7 standard covering content creation, content distribution and content consumption, and presents a comprehensive overview of the principles and concepts involved in the complete range of Audio Visual material indexing, metadata description, information retrieval and browsing.
Abstract: From the Publisher: The MPEG standards are an evolving set of standards for video and audio compression. MPEG 7 technology covers the most recent developments in multimedia search and retreival, designed to standardise the description of multimedia content supporting a wide range of applications including DVD, CD and HDTV. Multimedia content description, search and retrieval is a rapidly expanding research area due to the increasing amount of audiovisual (AV) data available. The wealth of practical applications available and currently under development (for example, large scale multimedia search engines and AV broadcast servers) has lead to the development of processing tools to create the description of AV material or to support the identification or retrieval of AV documents. Written by experts in the field, this book has been designed as a unique tutorial in the new MPEG 7 standard covering content creation, content distribution and content consumption. At present there are no books documenting the available technologies in such a comprehensive way. Presents a comprehensive overview of the principles and concepts involved in the complete range of Audio Visual material indexing, metadata description, information retrieval and browsingDetails the major processing tools used for indexing and retrieval of images and video sequencesIndividual chapters, written by experts who have contributed to the development of MPEG 7, provide clear explanations of the underlying tools and technologies contributing to the standardDemostration software offering step-by-step guidance to the multi-media system components and eXperimentation model (XM) MPEG reference softwareCoincides with the release of the ISO standard in late 2001. A valuable reference resource for practising electronic and communications engineers designing and implementing MPEG 7 compliant systems, as well as for researchers and students working with multimedia database technology.
TL;DR: This work presents a high-level overview of the MPEG-7 standard, discussing the scope, basic terminology, and potential applications, and compares the relationship with other standards to highlight its capabilities.
Abstract: MPEG-7, formally known as the Multimedia Content Description Interface, includes standardized tools (descriptors, description schemes, and language) enabling structural, detailed descriptions of audio-visual information at different granularity levels (region, image, video segment, collection) and in different areas (content description, management, organization, navigation, and user interaction). It aims to support and facilitate a wide range of applications, such as media portals, content broadcasting, and ubiquitous multimedia. We present a high-level overview of the MPEG-7 standard. We first discuss the scope, basic terminology, and potential applications. Next, we discuss the constituent components. Then, we compare the relationship with other standards to highlight its capabilities.
TL;DR: The scope of the MPEG-4 video standard is described and the structure of the video verification model under development is outlined, to provide a fully defined core video coding algorithm platform for the development of the standard.
Abstract: The MPEG-4 standardization phase has the mandate to develop algorithms for audio-visual coding allowing for interactivity, high compression, and/or universal accessibility and portability of audio and video content. In addition to the conventional "frame"-based functionalities of the MPEG-1 and MPEG-2 standards, the MPEG-4 video coding algorithm will also support access and manipulation of "objects" within video scenes. The January 1996 MPEG Video Group meeting witnessed the definition of the first version of the MPEG-4 video verification model-a milestone in the development of the MPEG-4 standard. The primary intent of the video verification model is to provide a fully defined core video coding algorithm platform for the development of the standard. As such, the structure of the MPEG-4 video verification model already gives some indication about the tools and algorithms that will be provided by the final MPEG-4 standard. The paper describes the scope of the MPEG-4 video standard and outlines the structure of the MPEG-4 video verification model under development.
TL;DR: The aim, methodologies, and broad details of the MPEG-7 standard development forVisual content description for visual content description are outlined.
Abstract: The MPEG-7 visual standard under development specifies content-based descriptors that allow users or agents (or search engines) to measure similarity in images or video based on visual criteria, and can be used to efficiently identify, filter, or browse images or video based on visual content. More specifically, MPEG-7 specifies color, texture, object shape, global motion, or object motion features for this purpose. This paper outlines the aim, methodologies, and broad details of the MPEG-7 standard development for visual content description.
••01 Aug 2017
TL;DR: This work presents a tracking-by-detection algorithm which can compete with more sophisticated approaches at a fraction of the computational cost and shows with thorough experiments its potential using a wide range of object detectors.
Abstract: Tracking-by-detection is a common approach to multi-object tracking. With ever increasing performances of object detectors, the basis for a tracker becomes much more reliable. In combination with commonly higher frame rates, this poses a shift in the challenges for a successful tracker. That shift enables the deployment of much simpler tracking algorithms which can compete with more sophisticated approaches at a fraction of the computational cost. We present such an algorithm and show with thorough experiments its potential using a wide range of object detectors. The proposed method can easily run at 100K fps while outperforming the state-of-the-art on the DETRAC vehicle tracking dataset.
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.
01 Jan 1980
01 Jan 2008
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.
TL;DR: A detailed overview of current advances in vision-based human action recognition is provided, including a discussion of limitations of the state of the art and outline promising directions of research.
Abstract: Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human-computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research.