scispace - formally typeset
Search or ask a question
Author

Meinard Müller

Other affiliations: Keio University, Max Planck Society, Fraunhofer Society  ...read more
Bio: Meinard Müller is an academic researcher from University of Erlangen-Nuremberg. The author has contributed to research in topics: Music information retrieval & Sound recording and reproduction. The author has an hindex of 41, co-authored 257 publications receiving 8924 citations. Previous affiliations of Meinard Müller include Keio University & Max Planck Society.


Papers
More filters
Book
26 Sep 2007
TL;DR: Analysis and Retrieval Techniques for Music Data, SyncPlayer: An Advanced Audio Player, and Relational Features and Adaptive Segmentation.
Abstract: Analysis and Retrieval Techniques for Music Data.- Fundamentals on Music and Audio Data.- Pitch- and Chroma-Based Audio Features.- Dynamic Time Warping.- Music Synchronization.- Audio Matching.- Audio Structure Analysis.- SyncPlayer: An Advanced Audio Player.- Analysis and Retrieval Techniques for Motion Data.- Fundamentals on Motion Capture Data.- DTW-Based Motion Comparison and Retrieval.- Relational Features and Adaptive Segmentation.- Index-Based Motion Retrieval.- Motion Templates.- MT-Based Motion Annotation and Retrieval.

1,576 citations

01 Jan 2008

544 citations

01 Jan 2007
TL;DR: The objective of the motion capture database HDM05 is to supply free motion capture data for research purposes and to provide several MATLAB tools comprising a parser for ASF/AMC and C3D as well as visualization, renaming and cutting tools, which are described in Sect.
Abstract: Preface In the past two decades, motion capture (mocap) systems have been developed that allow to track and record human motions at high spatial and temporal resolutions. The resulting motion capture data is used to analyze human motions in fields such as sports sciences and biometrics (person identification), and to synthesize realistic motion sequences in data-driven computer animation. Such applications require efficient methods and tools for the automatic analysis, synthesis and classification of motion capture data, which constitutes an active research area with many yet unsolved problems. Even though there is a rapidly growing corpus of motion capture data, the academic research community still lacks publicly available motion data, as supplied by [4], that can be freely used for systematic research on motion analysis, synthesis, and classification. Furthermore, a common dataset of annotated and well-documented motion capture data would be extremely valuable to the research community in view of an objective comparison and evaluation of the achieved research results. It is the objective of our motion capture database HDM05 1 to supply free motion capture data for research purposes. HDM05 contains more than tree hours of systematically recorded and well-documented motion capture data in the C3D as well as in the ASF/AMC data format. Furthermore, HDM05 contains for each of roughly 70 motion classes 10 to 50 realizations executed by various actors amounting to roughly 1, 500 motion clips. In this documentation, we give a detailed description of our mocap database HDM05. In Sect. 1, we provide some general information on motion capture data including references to various application fields. A detailed description of the database structure of HDM05 as well as of the content of each mocap file can be found in Sect. 2. We also provide several MATLAB tools comprising a parser for ASF/AMC and C3D as well as visualization, renaming and cutting tools, which are described in Sect. 3. Finally, Sect. 4 summarizes some facts on the mocap file formats ASF/AMC and C3D as used in our database. We appreciate any comments and suggestions for improvement. 1 The motion capture data has been recorded at the Hochschule der Medien (HDM) in the year 2005 under the supervision of Bernhard Eberhardt.

459 citations

Journal ArticleDOI
01 Jul 2005
TL;DR: This paper introduces various kinds of qualitative features describing geometric relations between specified body points of a pose and shows how these features induce a time segmentation of motion capture data streams, and adopts efficient indexing methods allowing for flexible and efficient content-based retrieval and browsing in huge motion capture databases.
Abstract: The reuse of human motion capture data to create new, realistic motions by applying morphing and blending techniques has become an important issue in computer animation. This requires the identification and extraction of logically related motions scattered within some data set. Such content-based retrieval of motion capture data, which is the topic of this paper, constitutes a difficult and time-consuming problem due to significant spatio-temporal variations between logically related motions. In our approach, we introduce various kinds of qualitative features describing geometric relations between specified body points of a pose and show how these features induce a time segmentation of motion capture data streams. By incorporating spatio-temporal invariance into the geometric features and adaptive segments, we are able to adopt efficient indexing methods allowing for flexible and efficient content-based retrieval and browsing in huge motion capture databases. Furthermore, we obtain an efficient preprocessing method substantially accelerating the cost-intensive classical dynamic time warping techniques for the time alignment of logically similar motion data streams. We present experimental results on a test data set of more than one million frames, corresponding to 180 minutes of motion. The linearity of our indexing algorithms guarantees the scalability of our results to much larger data sets.

406 citations

Proceedings ArticleDOI
02 Sep 2006
TL;DR: New methods for automatic classification and retrieval of motion capture data facilitating the identification of logically related motions scattered in some database are presented, and the concept of motion templates (MTs) are introduced, by which the essence of an entire class of logicallyrelated motions can be captured in an explicit and semantically interpretable matrix representation.
Abstract: This paper presents new methods for automatic classification and retrieval of motion capture data facilitating the identification of logically related motions scattered in some database. As the main ingredient, we introduce the concept of motion templates (MTs), by which the essence of an entire class of logically related motions can be captured in an explicit and semantically interpretable matrix representation. The key property of MTs is that the variable aspects of a motion class can be automatically masked out in the comparison with unknown motion data. This facilitates robust and efficient motion retrieval even in the presence of large spatio-temporal variations. Furthermore, we describe how to learn an MT for a specific motion class from a given set of training motions. In our extensive experiments, which are based on several hours of motion data, MTs proved to be a powerful concept for motion annotation and retrieval, yielding accurate results even for highly variable motion classes such as cartwheels, lying down, or throwing motions.

361 citations


Cited by
More filters
01 Jan 2016
TL;DR: This is an introduction to the event related potential technique, which can help people facing with some malicious bugs inside their laptop to read a good book with a cup of tea in the afternoon.
Abstract: Thank you for downloading an introduction to the event related potential technique. Maybe you have knowledge that, people have look hundreds times for their favorite readings like this an introduction to the event related potential technique, but end up in malicious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they are facing with some malicious bugs inside their laptop.

2,445 citations

Journal ArticleDOI
TL;DR: A new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, is introduced for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms.
Abstract: We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m .

2,209 citations

Journal ArticleDOI
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Abstract: Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together Multimodal machine learning aims to build models that can process and relate information from multiple modalities It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research

1,945 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
Abstract: This document describes version 0.4.0 of librosa: a Python pack- age for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information retrieval. In this document, a brief overview of the library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

1,793 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper proposes an end-to-end hierarchical RNN for skeleton based action recognition, and demonstrates that the model achieves the state-of-the-art performance with high computational efficiency.
Abstract: Human actions can be represented by the trajectories of skeleton joints. Traditional methods generally model the spatial structure and temporal dynamics of human skeleton with hand-crafted features and recognize human actions by well-designed classifiers. In this paper, considering that recurrent neural network (RNN) can model the long-term contextual information of temporal sequences well, we propose an end-to-end hierarchical RNN for skeleton based action recognition. Instead of taking the whole skeleton as the input, we divide the human skeleton into five parts according to human physical structure, and then separately feed them to five subnets. As the number of layers increases, the representations extracted by the subnets are hierarchically fused to be the inputs of higher layers. The final representations of the skeleton sequences are fed into a single-layer perceptron, and the temporally accumulated output of the perceptron is the final decision. We compare with five other deep RNN architectures derived from our model to verify the effectiveness of the proposed network, and also compare with several other methods on three publicly available datasets. Experimental results demonstrate that our model achieves the state-of-the-art performance with high computational efficiency.

1,642 citations