scispace - formally typeset
Search or ask a question
Author

Dinesh Babu Jayagopi

Bio: Dinesh Babu Jayagopi is an academic researcher from International Institute of Information Technology, Bangalore. The author has contributed to research in topics: Computer science & Handwriting recognition. The author has an hindex of 16, co-authored 67 publications receiving 1008 citations. Previous affiliations of Dinesh Babu Jayagopi include École Polytechnique Fédérale de Lausanne & Accenture.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents a systematic study on dominance modeling in group meetings from fully automatic nonverbal activity cues, in a multi-camera, multi-microphone setting, and investigates efficient audio and visual activity cues for the characterization of dominant behavior, analyzing single and joint modalities.
Abstract: Dominance - a behavioral expression of power - is a fundamental mechanism of social interaction, expressed and perceived in conversations through spoken words and audiovisual nonverbal cues. The automatic modeling of dominance patterns from sensor data represents a relevant problem in social computing. In this paper, we present a systematic study on dominance modeling in group meetings from fully automatic nonverbal activity cues, in a multi-camera, multi-microphone setting. We investigate efficient audio and visual activity cues for the characterization of dominant behavior, analyzing single and joint modalities. Unsupervised and supervised approaches for dominance modeling are also investigated. Activity cues and models are objectively evaluated on a set of dominance-related classification tasks, derived from an analysis of the variability of human judgment of perceived dominance in group discussions. Our investigation highlights the power of relatively simple yet efficient approaches and the challenges of audiovisual integration. This constitutes the most detailed study on automatic dominance modeling in meetings to date.

227 citations

Proceedings ArticleDOI
29 Sep 2007
TL;DR: A framework for detecting dominance in group meetings using different audio and video cues is provided and it is shown that by using a simple model for dominance estimation the authors can obtain promising results.
Abstract: The automated extraction of semantically meaningful information from multi-modal data is becoming increasingly necessary due to the escalation of captured data for archival. A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. We show that by using a simple model for dominance estimation we can obtain promising results.

84 citations

Journal ArticleDOI
TL;DR: It is indicated that emergent leadership is related, but not equivalent, to dominance, and while multimodal features bring a moderate degree of effectiveness in inferring the leader, much simpler features extracted from the audio channel are found to give better performance.
Abstract: In this paper we present a multimodal analysis of emergent leadership in small groups using audio-visual features and discuss our experience in designing and collecting a data corpus for this purpose. The ELEA Audio-Visual Synchronized corpus (ELEA AVS) was collected using a light portable setup and contains recordings of small group meetings. The participants in each group performed the winter survival task and filled in questionnaires related to personality and several social concepts such as leadership and dominance. In addition, the corpus includes annotations on participants’ performance in the survival task, and also annotations of social concepts from external viewers. Based on this corpus, we present the feasibility of predicting the emergent leader in small groups using automatically extracted audio and visual features, based on speaking turns and visual attention, and we focus specifically on multimodal features that make use of the looking at participants while speaking and looking at while not speaking measures. Our findings indicate that emergent leadership is related, but not equivalent, to dominance, and while multimodal features bring a moderate degree of effectiveness in inferring the leader, much simpler features extracted from the audio channel are found to give better performance.

75 citations

Proceedings ArticleDOI
13 Nov 2017
TL;DR: This work analyzes the engagement or attention level of the students from their facial expressions, head pose and eye gaze using computer vision techniques and a decision is taken using machine learning algorithms.
Abstract: Student engagement is the key to successful classroom learning. Measuring or analyzing the engagement of students is very important to improve learning as well as teaching. In this work, we analyze the engagement or attention level of the students from their facial expressions, headpose and eye gaze using computer vision techniques and a decision is taken using machine learning algorithms. Since the human observers are able to well distinguish the attention level from student’s facial expressions,head pose and eye gaze, we assume that machine will also be able to learn the behavior automatically. The engagement level is analyzed on 10 second video clips. The performance of the algorithm is better than the baseline results. Our best accuracy results are 10 % better than the baseline. The paper also gives a detailed review of works related to the analysis of student engagement in a classroom using vision based techniques.

66 citations

Proceedings ArticleDOI
20 Oct 2008
TL;DR: It is suggested that fully automated versions of these measures can estimate effectively the most dominant person in a meeting and can match the dominance estimation performance when manual labels of visual attention are used.
Abstract: We study the automation of the visual dominance ratio (VDR); a classic measure of displayed dominance in social psychology literature, which combines both gaze and speaking activity cues. The VDR is modified to estimate dominance in multi-party group discussions where natural verbal exchanges are possible and other visual targets such as a table and slide screen are present. Our findings suggest that fully automated versions of these measures can estimate effectively the most dominant person in a meeting and can match the dominance estimation performance when manual labels of visual attention are used.

57 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Book
01 May 2017
TL;DR: It is argued that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement - in order to become more effective and more efficient.
Abstract: The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement - in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for social signal processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially aware computing.

988 citations

Proceedings ArticleDOI
26 Sep 2010
TL;DR: It is shown how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.
Abstract: Today's mobile phones represent a rich and powerful computing platform, given their sensing, processing and communication capabilities. Phones are also part of the everyday life of billions of people, and therefore represent an exceptionally suitable tool for conducting social and psychological experiments in an unobtrusive way.de the ability of sensing individual emotions as well as activities, verbal and proximity interactions among members of social groups. Moreover, the system is programmable by means of a declarative language that can be used to express adaptive rules to improve power saving. We evaluate a system prototype on Nokia Symbian phones by means of several small-scale experiments aimed at testing performance in terms of accuracy and power consumption. Finally, we present the results of real deployment where we study participants emotions and interactions. We cross-validate our measurements with the results obtained through questionnaires filled by the users, and the results presented in social psychological studies using traditional methods. In particular, we show how speakers and participants' emotions can be automatically detected by means of classifiers running locally on off-the-shelf mobile phones, and how speaking and interactions can be correlated with activity and location measures.

504 citations