scispace - formally typeset
Search or ask a question
Author

A.K. Noulas

Bio: A.K. Noulas is an academic researcher from University of Amsterdam. The author has contributed to research in topics: Speaker diarisation & Hidden Markov model. The author has an hindex of 6, co-authored 9 publications receiving 963 citations.

Papers
More filters
Proceedings ArticleDOI
21 Sep 2008
TL;DR: This paper presents an easy to install sensor network and an accurate but inexpensive annotation method and shows how the hidden Markov model and conditional random fields perform in recognizing activities.
Abstract: A sensor system capable of automatically recognizing activities would allow many potential ubiquitous applications. In this paper, we present an easy to install sensor network and an accurate but inexpensive annotation method. A recorded dataset consisting of 28 days of sensor data and its annotation is described and made available to the community. Through a number of experiments we show how the hidden Markov model and conditional random fields perform in recognizing activities. We achieve a timeslice accuracy of 95.6% and a class accuracy of 79.4%.

873 citations

Journal ArticleDOI
TL;DR: A novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization and is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovISual space.
Abstract: We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.

70 citations

Proceedings ArticleDOI
12 Nov 2007
TL;DR: A novel framework that utilizes multi-modal information to achieve speaker diarization using dynamic Bayesian networks and an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model is presented.
Abstract: This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming from webcameras, human computer interaction and video conferences.

32 citations

Proceedings ArticleDOI
02 Nov 2006
TL;DR: A Bayesian network model is proposed that utilizes the extracted feature characteristics, their relations and their temporal patterns to achieve efficient assignment of informative cues to the person that created them in content analysis of clips containing people speaking.
Abstract: Content analysis of clips containing people speaking involves processing informative cues coming from different modalities. These cues are usually the words extracted from the audio modality, and the identity of the persons appearing in the video modality of the clip. To achieve efficient assignment of these cues to the person that created them, we propose a Bayesian network model that utilizes the extracted feature characteristics, their relations and their temporal patterns. We use the EM algorithm in which the E-step estimates the expectation of the complete-data log-likelihood with respect to the hidden variables - that is the identity of the speakers and the visible persons. In the M-step , the person models that maximize this expectation are computed. This framework produces excellent results, exhibiting exceptional robustness when dealing with low quality data.

24 citations

01 Jan 2008
TL;DR: This work presents the basic advantages of this model over generative models and argues about its suitability in the domain of activity recognition from sensor networks and presents experimental results on a realworld dataset that support this argumentation.
Abstract: Conditional Random Fields are a discriminative probabilistic model which recently gained popularity in applications that require modeling nonindependent observation sequences. In this work, we present the basic advantages of this model over generative models and argue about its suitability in the domain of activity recognition from sensor networks. We present experimental results on a realworld dataset that support this argumentation.

15 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Abstract: Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together Multimodal machine learning aims to build models that can process and relate information from multiple modalities It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research

1,945 citations

Journal ArticleDOI
TL;DR: This is the first demonstration of a low-cost accurate video-based method for contact-free heart rate measurements that is automated, motion-tolerant and capable of performing concomitant measurements on more than one person at a time.
Abstract: Remote measurements of the cardiac pulse can provide comfortable physiological assessment without electrodes. However, attempts so far are non-automated, susceptible to motion artifacts and typically expensive. In this paper, we introduce a new methodology that overcomes these problems. This novel approach can be applied to color video recordings of the human face and is based on automatic face tracking along with blind source separation of the color channels into independent components. Using Bland-Altman and correlation analysis, we compared the cardiac pulse rate extracted from videos recorded by a basic webcam to an FDA-approved finger blood volume pulse (BVP) sensor and achieved high accuracy and correlation even in the presence of movement artifacts. Furthermore, we applied this technique to perform heart rate measurements from three participants simultaneously. This is the first demonstration of a low-cost accurate video-based method for contact-free heart rate measurements that is automated, motion-tolerant and capable of performing concomitant measurements on more than one person at a time.

1,491 citations

Journal ArticleDOI
TL;DR: A simple, low-cost method for measuring multiple physiological parameters using a basic webcam, by applying independent component analysis on the color channels in video recordings, which extracted the blood volume pulse from the facial regions.
Abstract: We present a simple, low-cost method for measuring multiple physiological parameters using a basic webcam. By applying independent component analysis on the color channels in video recordings, we extracted the blood volume pulse from the facial regions. Heart rate (HR), respiratory rate, and HR variability (HRV, an index for cardiac autonomic activity) were subsequently quantified and compared to corresponding measurements using Food and Drug Administration-approved sensors. High degrees of agreement were achieved between the measurements across all physiological parameters. This technology has significant potential for advancing personal health care and telemedicine.

1,269 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition using on-body inertial sensors and describe the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems.
Abstract: The last 20 years have seen ever-increasing research activity in the field of human activity recognition. With activity recognition having considerably matured, so has the number of challenges in designing, implementing, and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research, and introduce the best practice methods developed by the activity recognition research community. We conclude with the educational example problem of recognizing different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.

1,214 citations