scispace - formally typeset

Book ChapterDOI

Intelligent presentation skills trainer analyses body movement

10 Jun 2015-pp 320-332

TL;DR: This paper presents an intelligent tutoring system that can capture bodily characteristics of presenters via a depth camera, interpret this information in order to assess the quality of the presentation, and then give feedbacks to users.
Abstract: Public speaking is a non-trivial task since it is affected by how nonverbal behaviors are expressed. Practicing to deliver the appropriate expressions is difficult while they are mostly given subconsciously. This paper presents our empirical study on the nonverbal behaviors of presenters. Such information was used as the ground truth to develop an intelligent tutoring system. The system can capture bodily characteristics of presenters via a depth camera, interpret this information in order to assess the quality of the presentation, and then give feedbacks to users. Feedbacks are delivered immediately through a virtual conference room, in which the reactions of the simulated avatars can be controlled based on the performance of presenters.

Summary (1 min read)

1 Introduction

  • Public speaking is the art of persuasion.
  • The system in [10] was built solely on vocal cues, by analyzing the physical characteristics of voice such as pitch or tempo.
  • Firstly, an empirical study was performed in order to investigate on the nonverbal cues that impact a presentation, serving as the ground truth.
  • Multi-class support vector machine was used to classify the quality of presentations into a four-degree scale with the recognition rate of 73.9% on a training/test database that includes 76 presentations.
  • In the next section, the authors will explain their empirical results from the recorded presentations.

2 Nonverbal Behaviour of Presenters

  • In order to gather the ground truth as the guidance for their system, an observation was performed.
  • After each presentation, the audience gave feedbacks and suggestions on how the presentation should be improved, in terms of nonverbal expressions.
  • In parallel, a Microsoft Kinect was used to capture the wholebody movement for their further signal processing, as well as behavioral studies .
  • Behaviors were categorized into either State event if their duration is necessary to be studied, or Point event otherwise.
  • The observed behaviors can be separated based on the nonverbal channels that they were generated: (1) Posture (the static configuration of body), (2) Voice (concerning the paralinguistic characteristics), (3) Eye contact, (4) Facial Expression, (5) Globe body movement, (6) Hand gesture.

3 Automatic Feedback System

  • In order to support presenters with an effective solution that can help them self-practice even at home, the authors aimed to implement the system with the following functions: (1)Automatic analyzes presenters performance; (2)Provides immediate feedback during the presentation; (3)Provides overall analysis about the whole presentation; (4)Lets users review their performance together with the analyzed results, thus allows them to keep track of their practicing progress.
  • Users also have the chance to review their presentations, together with in-depth analysis about nonverbal cues in the end.

3.1 Recognition of Nonverbal Cues

  • In order to implement a system that mostly takes visual nonverbal channels into account, together with conclusions from their observation, the authors currently focused on the four aspects: (1)Eye contact; (2)Posture; (3)Gesture; (4)Whole body movement.
  • Eye contact has significant impact on the creation and maintenance the connection between a presenter and the audience.
  • This degree of leaning is determined via comparing the distance between Hip Center - (HC) with Foot Left and Foot Right (Equation 2).
  • After the separated behaviors can be recognized, the authors aimed to produce the final assessment for the whole presentation.
  • Similarly, the number of dimensions of the feature vectors of movement, posture and eye contact are: 4, 5 and 1, respectively.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Intelligent Presentation Skills Trainer
Analyses Body Movement
Anh-Tuan Nguyen
(
B
)
, Wei Chen, and Matthias Rauterberg
Department of Industrial Design, Eindhoven University of Technology,
Postbus 513, 5600MB Eindhoven, The Netherlands
{a.nguyen,w.chen,g.w.m.rauterberg}@tue.nl
Abstract. Public speaking is a non-trivial task since it is affected by
how nonverbal behaviors are expressed. Practicing to deliver the appro-
priate expressions is difficult while they are mostly given subconsciously.
This paper presents our empirical study on the nonverbal behaviors of
presenters. Such information was used as the ground truth to develop
an intelligent tutoring system. The system can capture bodily charac-
teristics of presenters via a depth camera, interpret this information in
order to assess the quality of the presentation, and then give feedbacks to
users. Feedbacks are delivered immediately through a virtual conference
room, in which the reactions of the simulated avatars can be controlled
based on the performance of presenters.
Keywords: Body motion analysis
· Depth vision · Nonverbal behavior ·
Social signal processing
1 Introduction
Public speaking is the art of persuasion. It has the tremendous impact on the
success of everyone [1, p.102]. Unfortunately, delivering an oral presentation is
not as simple as computer data transmission. Instead, the audience simultane-
ously perceives the messages via various non-spoken channels, which are known
as nonverbal behaviors. On one hand, the content of a presentation must be
clear, vivid and appropriate [2]. On the other hand, the significant component
of a presentation lies upon nonverbal cues, which has the power to change the
meaning assigned to the spoken words [1, p.241].
Nonverbal behaviors of public speakers are expressed via several channels
such as voice, gesture and facial expression. They have been proven to have
greater influence than verbal cues. For example, a research by [3] showed that,
nonverbal messages are twelve to thirteen times more powerful than verbal ones.
Similarly, according to [4], the audience receives more than half of information
from body language. The same result was found during the study of [1], in which
most people unconsciously more believe in nonverbal than verbal communica-
tion.
Practicing to express the effective nonverbal behaviors is difficult due to the
fact that, they are mostly expressed subconsciously. Thus, in order to achieve the
c
Springer International Publishing Switzerland 2015
I. Rojas et al. (Eds.): IWANN 2015, Part II, LNCS 9095, pp. 320–332, 2015.
DOI: 10.1007/978-3-319-19222-2
27

Intelligent Presentation Skills Trainer Analyses Body Movement 321
positive learning results, learners must be provided with the appropriate feed-
backs from skilled experts, which in most cases might be expensive to achieve. In
parallel, the role of nonverbal behaviors in computing is becoming increasingly
recognized by the development of the emerging fields, such as social signal pro-
cessing [5] and affective computing [6]. Therefore, computers have been equipped
with the abilities to decode the complexity of humans non-spoken channels.
In the literature, there are several approaches toward the automatic recogni-
tion of nonverbal cues from presenters, such as [710]. These approaches analyze
some vocal and visual channels of presenters, thus can provide them with the
information about their performance. For example, the system in [10] was built
solely on vocal cues, by analyzing the physical characteristics of voice such as
pitch or tempo. It was similar to the approach of [9], which was originated
from a vocal emotion detection module. The authors applied the support vec-
tor machine [11] to analyze one presentation based on a set of 6 qualities and
achieved the accuracy of 81%. The approach introduced in [7] might be the sim-
plest. By relying on the importance of pitch variance in oral presentations, the
system measured the changes in vocal pitch, and then give visual feedbacks to
promote pitch variation.
On the other hand, there are three systems that include visual cues in the
analysis. In [8], the authors added face position and orientation as the approxi-
mation of eye contact, together with utterance, pitch, filled pauses and speaking
rate. In contrast, [12] introduced the method that only based on visual informa-
tion. Similar to [8], face orientation was used as an indication for eye contact.
The authors tracked the trajectories of global body movement and head posi-
tion. This information helped their system to rank the performance of the whole
presentation using the RankBoost algorithm [13], achieved promising results.
However, they did not consider the complexity of body parts. To the best of our
knowledge, the system that was presented in [14] is the only one that included
the configurations of single body parts.
The common drawback of most existing systems is that, they were not imple-
mented based on empirical research of nonverbal behaviors. Moreover, although
most of them provided mechanisms to deliver feedbacks to the presenters (except
[9]), the forms of feedbacks are rather simple. They are text/images [8], sound
[10] or lightning [7]. These methods can only provide users with solely assess-
ment information, without concerning the entertaining aspect of the system,
which might be valuable for educational purposes.
This paper presents our progress in developing a tutoring system for pub-
lic speaking, which assesses presentations based solely on the visual behaviors
of presenters. Firstly, an empirical study was performed in order to investigate
on the nonverbal cues that impact a presentation, serving as the ground truth.
Next, a Microsoft Kinect was implemented for capturing skeletal representations
of the presenters’ body as input data for the analysis. The recognition process
can detect if the behaviors appeared in real-time. Multi-class support vector
machine was used to classify the quality of presentations into a four-degree scale
with the recognition rate of 73.9% on a training/test database that includes

322 A.-T. Nguyen et al.
76 presentations. For the feedback, the system allows presenters to review their
presentation, together with the analysis results. In parallel, we developed a sim-
ulated conference room as the real-time feedback mechanism.
In the next section, we will explain our empirical results from the recorded
presentations. The current development status will be introduced afterward. The
last section is for conclusions and future works.
2 Nonverbal Behaviour of Presenters
In order to gather the ground truth as the guidance for our system, an obser-
vation was performed. We collected data from a training class about public
speaking skills for postgraduate students. Learners were asked to give short pre-
sentations (about one minute) in front of the audience, which includes about
ten other learners and one or two coaches. The content of the presentations
was freely chosen by the presenters. In fact, all presenters chose to talk about
their own research, in the ways that it can be understood by all of the audience
that might came from the different fields. After each presentation, the audience
gave feedbacks and suggestions on how the presentation should be improved,
in terms of nonverbal expressions. We set up a regular camera to record the
presentations. In parallel, a Microsoft Kinect was used to capture the whole-
body movement for our further signal processing, as well as behavioral studies
(Figure 1). Data from Kinect was stored as the *.ONI files using the OpenNI
SDK (http://www.openni.org/). Finally, after removed the unsatisfied videos
(e.g. presenters moved out of the camera range), 39 presentations of 11 presen-
ters (four females, seven males) were collected.
Regular videos were used for behavioral analysis. This task was done through
the collaboration with an expert in public speaking. The role of the expert
was to review the recorded videos, and then specifying the nonverbal cues that
affected the performance of the speakers, together with the durations that they
appeared. Thus, for each video, a set of behaviors was created. We collected
the nonverbal cues and then annotated their appearance using the commercial
software Noldus Observer XT [15]. Behaviors were categorized into either State
event if their duration is necessary to be studied, or Point event otherwise. The
software provided us with the statistical analysis on the appearance of these
behaviors, including the number of presentations that contain the behaviors, the
rate that they appeared (point events) and the percentage of time that they
accounted for (Table 1).
The observed behaviors can be separated based on the nonverbal channels
that they were generated: (1) Posture (the static configuration of body), (2)
Voice (concerning the paralinguistic characteristics), (3) Eye contact, (4) Facial
Expression, (5) Globe body movement, (6) Hand gesture. This method of cat-
egorization is similar to the literature of public speaking skills [2]. Due to the
limited amount of space, we could not describe all of the observed behaviors in

Intelligent Presentation Skills Trainer Analyses Body Movement 323
Table 1. The list of observed nonverbal cues
Event Rate of occurrences Percentage during observation
# Behaviors Type No. (times/minute) of the occurrences (%)
(S/P) M SD Range M SD Range
Postural behaviors
1 (-) Shoulders too tight S 19 60.94 23.80 12.67 - 98.50
2 (-) Legs closed S 12 73.02 36.44 5.15 - 100
3 (-) Legs too stretch S 3 61.42 11.33 19.18 - 100
4 (-) Weight in one foot S 20 65.42 28.69 5.20 - 100
5 (-) Chin too high S 14 64.94 23.80 12.67 - 98.50
6 (-) Hands in pockets S 3 11.85 4.89 12.76 - 96.20
7 (+) Lean forward S 19 32.50 28.66 3.70 - 82.78
8 (-) Lean backward S 17 62.80 28.07 12.73 - 96.20
Vo cal b ehav i o r s
9 (-) Speak too fast S 19 45.88 36.55 7.32 - 100
10 (-) Start too fast P 18
11 (-) Energy decreases at the
end
P 23 2.88 1.77 0.53 - 6.31
12 (+) Vocal emphasis P 33 5.51 4.51 0.59 - 17.50
13 (+) Suitable pause P 33 4.63 3.16 0.53 - 12.5
14 (-) Unsuitable pause P 20 1.73 1.14 0.53 - 5.19
15 (-) Monotone S 20 92.49 13.08 56.29 - 100
16 (-) Fillers P 34 5.17 4.22 1.44 - 19.03
17 (-) Stuttering P 12 1.72 0.83 0.53 - 3.42
Behaviors of eye contact
18 (+) Make eye contact S 39 93.81 8.24 75.00 - 100
19 (-) Contact avoidance S 28 9.98 8.47 1.12 - 25.00
19.1 (-) Look up to ceiling S 14 4.23 2.95 1.12 - 9.61
19.2 (-) Look down to floor S 19 7.67 4.67 2.84 - 14.17
19.3 (-) Look at hands S 11 10.24 3.15 4.40 - 13.15
Behaviors related to facial expression
20 (+) Facial mimicry S 30 39.31 25.97 4.50 - 91.81
21 (+) Smile S 22 13.62 11.54 3.54 - 41.08
22 (-) Flat face S 8 80.61 24.16 40.41 - 100
Behaviors related to whole body movement
23 (-) Too much movement P 11 42.21 25.87 4.68 - 89.32
24 (-) Too little movement P 23 50.62 29.21 10.05 - 100
25 (-) Step backward P 31 1.83 1.27 0.36 - 4.36
26 (+) Step forward P 34 2.06 1.04 0.59 - 4.61
Behaviors related to hand gesture
Amount of hand gesture
27 Hand gesture occur P 38 16.83 7.15 0.93 - 28.42
28 (-) Too little gestures S 20 69.55 34.64 17.21 - 100
29 (-) Too much gestures S 10 61.49 31.82 27.34 - 96.10
Quality of hand gesture
30 (-) Bounded gestures P 30 6.75 5.33 1.00 - 19.77
31 (+) Relaxed gestures P 29 7.41 4.95 1.15 - 15.79
32 (-) Casual gestures P 10 5.16 3.14 1.56 - 10.28
33 (-) Uncompleted gestures P 27 3.23 2.78 0.93 - 10.27
34 (+) Gestural e mphasis P 20 4.43 4.05 0.36 - 11.99
35 (-) Repeated gestures P 31 6.57 2.49 1.09 - 12.31
detail in this section. Only the behaviors that support our current development
will be further explained in the next section. On the other hand, although we
aimed to observe all of the available nonverbal cues, the contributions of each

324 A.-T. Nguyen et al.
Fig. 1. Two samples from the database, with the color images (top row) and the skeletal
representations of presenters’ bodies, which were extracted and stored using Microsoft
Kinect and the OpenNI SDK (bottom row)
individual to the success of a presentation are unequal. From our observation, as
well as advices from the expert, the following aspects are the most important:
Eye Contact: Similar to social interaction, maintaining good eye contact is
the first thing the presenters must keep in mind. It initiates and strengthens
the connection between them and the audience (#18, 19 in Table 1). It might
have the first and foremost influence to the performance of a presentation,
as well as regular communications [16].
Amount of energy: This aspect concerns the dynamic characteristics of a
presentation, thus can reflect the internal state of the presenters. It has
impact in most behaviors that we have found (except posture as the static
channel). For example, the amount of whole body movement (#23, 24),
the amount of hand gesture (#28, 29), vocal behaviors (partly via tempo,
emphases) and most features of hand gesture.
Variety: The presentations with strong variations significantly increase the
attention of the audience. Lacking variation results in monotone (#15), flat
face (#22), and hand gesture repeated (#35). In fact, variety can be sepa-
rated as one single measurement to analyze a presentation. It takes the role

Citations
More filters

Journal ArticleDOI
TL;DR: A clustering analysis of data gathered from 45 student presentations indicate that presentations on similar topics with also similar complexity levels can be successfully discriminated.
Abstract: Learning Analytics is the intelligent use of data generated from students with the objective of understanding and improving the teaching and learning process. Currently, there is a lack of tools to measure the development of complex skills in real classroom environments that are flexible enough to add and process data from different sensors and oriented towards a massive public. Based on this finding, we developed a free software system that permits to capture and to visualize a set of 10 body postures using the Microsoft Kinect sensor, along with the ability to track custom body postures and data from other sensors. The developed tool was validated by means of precision and usability tests. Furthermore, with the goal of demonstrating the potential of incorporating this type of software into the classroom, the software was used as a tool to give feedback to the teacher and to the students at the moment of giving and evaluating oral presentations. Also, a clustering analysis of data gathered from 45 student presentations indicate that presentations on similar topics with also similar complexity levels can be successfully discriminated.

15 citations


Dissertation
01 Jan 2019-
Abstract: 11

7 citations


Journal ArticleDOI
Abstract: Multimodal learning analytics, which is collection, analysis, and report of diverse learning traces to better understand and improve the learning process, has been producing a series of interesting prototypes to analyze learning activities that were previously hard to objectively evaluate. However, none of these prototypes have been taken out of the laboratory and integrated into real learning settings. This article is the first to propose, execute, and evaluate a process to scale and deploy one of these applications, an automated oral presentation feedback system, into an institution-wide setting. Technological, logistical, and pedagogical challenges and adaptations are discussed. An evaluation of the use and effectiveness of the deployment shows both successful adoption and moderate learning gains, especially for low-performing students. In addition, the recording and summarizing of the perception of both instructors and students point to a generally positive experience in spite of the common problems of a first-generation deployment of a complex learning technology.

References
More filters

Proceedings Article
24 Jul 1998-
Abstract: We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based on the movie rankings provided by other users. In this work, we begin by presenting a formal framework for this general problem. We then describe and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning. We give theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training. We also describe an efficient implementation of the algorithm for a particular restricted but common case. We next discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different web search strategies, each of which is a query expansion for a given domain. The second experiment is a collaborative-filtering task for making movie recommendations.

1,888 citations


Journal ArticleDOI
TL;DR: This work describes and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning, and gives theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training.
Abstract: We study the problem of learning to accurately rank a set of objects by combining a given collection of ranking or preference functions. This problem of combining preferences arises in several applications, such as that of combining the results of different search engines, or the "collaborative-filtering" problem of ranking movies for a user based on the movie rankings provided by other users. In this work, we begin by presenting a formal framework for this general problem. We then describe and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning. We give theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training. We also describe an efficient implementation of the algorithm for a particular restricted but common case. We next discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different web search strategies, each of which is a query expansion for a given domain. The second experiment is a collaborative-filtering task for making movie recommendations.

1,821 citations


Book ChapterDOI
Robert J. K. Jacob1, Keith S. KarnInstitutions (1)
01 Jan 2003-
TL;DR: This chapter discusses the application of eye movements to user interfaces, both for analyzing interfaces (measuring usability) and as an actual control medium within a human–computer dialogue.
Abstract: Publisher Summary This chapter discusses the application of eye movements to user interfaces, both for analyzing interfaces (measuring usability) and as an actual control medium within a human–computer dialogue. For usability analysis, the user's eye movements are recorded during system use and later analyzed retrospectively; however, the eye movements do not affect the interface in real time. As a direct control medium, the eye movements are obtained and used in real time as an input to the user–computer dialogue. The eye movements might be the sole input, typically for disabled users or hands-busy applications, or might be used as one of several inputs, combining with mouse, keyboard, sensors, or other devices. From the perspective of mainstream eye-movement research, human–computer interaction, together with related work in the broader field of communications and media research, appears as a new and very promising area of applied work. Both basic and applied work can profit from integration within a unified field of eye­-movement research. Application of eye tracking in human–computer interaction remains a very promising approach; its technological and market barriers are finally being reduced.

1,299 citations


Journal ArticleDOI
TL;DR: Research on gaze and eye contact was organized within the framework of Patterson's (1982) sequential functional model of nonverbal exchange to show how gaze functions to provide information, regulate interaction, express intimacy, and exercise social control.
Abstract: Research on gaze and eye contact was organized within the framework of Patterson's (1982) sequential functional model of nonverbal exchange. Studies were reviewed showing how gaze functions to (a) provide information, (b) regulate interaction, (c) express intimacy, (d) exercise social control, and (

1,192 citations


Book
01 May 2017-
TL;DR: It is argued that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement - in order to become more effective and more efficient.
Abstract: The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement - in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for social signal processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially aware computing.

934 citations


Network Information
Related Papers (5)
08 Nov 2011

Kazuyuki Fujita, Yuichi Itoh +8 more

14 Jun 2010

H. Chad Lane, Mike Schneider +3 more

27 Jun 2018

Nigel Bosch, Caitlin Mills +2 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20211
20191
20181