Proceedings Article•DOI•

Overlapping speech, utterance duration and affective content in HHI and HCI — An comparison

Q: What have the authors stated for future works in "Overlapping speech, utterance duration and affective content in hhi and hci – an comparison" ?

In their further research activities, the authors will develop a robust automatic identification of the different types of overlap. Together with the recognition of the user ’ s affective state, the authors are a step further to future Cognitive Infocommunication systems acting as a companion towards human users [ 13 ], [ 14 ].

Q: What test was used to test the significance of the difference between the two utterance lengths?

the authors used the non-parametric MannWhitney-U-Test, to test the significance of the difference within the utterance lengths.

Q: What is the effect of overlapping speech?

For this investigation, the authors showed that overlapping speech goes along with changes in the affective states of dominance and valence in certain situations.

Q: What annotation level is used for the Davero corpus?

This corpus has several annotation levels, of which for their investigation the turn segmentation and an affective annotation based on the acoustic channel is used [22].

Q: What is the possible application of their investigations in HHI and HCI?

A possible application of their investigations in HHI and HCI is the identification of parts where the affective state changes based on the knowledge of overlapping speech and the dialog course:

Q: How many overlapping speech segments do the authors have?

As the authors only take into account the overlapping speech, the authors have 6,347 user utterances and 817 utterances contain overlapping speech.

Ingo Siegert¹, Ronald Böck¹, Andreas Wendemuth¹, Bogdan Vlasenko², Kerstin Ohnemus - Show less +1 more•Institutions (2)

Otto-von-Guericke University Magdeburg¹, Idiap Research Institute²

01 Oct 2015-pp 83-88

TL;DR: The davero corpus a large naturalistic spoken corpus of real call center telephone conversations is investigated and the findings allow the prediction of forthcoming threat of overlapping speech, and hence preventive measures, especially in professional environments like call-centers with human or automatic agents.

read less

Abstract: In human conversation, turn-taking is a critical issue. Especially if only the speech channel is available (e.g. telephone), correct timing as well as affective and verbal signals are required. In cases of failure, overlapping speech may occur which is in the focus of this paper. We investigate the davero corpus a large naturalistic spoken corpus of real callcenter telephone conversations and compare our findings to results on the well-known SmartKom corpus consisting of human-computer interaction. We first show that overlapping speech occurs in different types of situational settings — extending the well-known categories cooperative and competitive overlaps —, all of which are frequent enough to be analyzed. Furthermore, we present connections between the occurrence of overlapping speech and the length of the previous utterance, and show that overlapping speech occurs at dialog instances where certain affective states are changing. Our results allow the prediction of forthcoming threat of overlapping speech, and hence preventive measures, especially in professional environments like call-centers with human or automatic agents.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [Conventional OLS and Zero-Accounting Models of the Gravity Equation] – [Empirical Model Specification and Data Sources] – [Estimated Results and Discussions] and [Conclusions]

Introduction

The impact of food safety standards on bilateral trade is commonly evaluated using the gravity econometric model.
Burger et al. (2009) further extend the PPML estimation of Santos Silva and Tenreyro (2006) by considering the negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models.
The Poisson regressions can solve the zero-omitted problem faced by the conventional log-normal OLS specification of the gravity equation and are robust to heteroskedasticity.
In this paper the authors use zero-accounting gravity models to evaluate the impact of food safety standards on developed country seafood imports.
Since the early 2000s, chemical standards including veterinary drug and other chemical residues have become the most serious challenges in the international seafood trade (Ababouch et al., 2005).

Conventional OLS and Zero-Accounting Models of the Gravity Equation

Anderson and van Wincoop’s gravity model: Tinbergen (1962) was the first to apply the Newtonian law of universal gravitation in physics to generate the gravity econometric model for studying bilateral trade flows.
The relevance of including GDPs in the gravity equation has been questioned because it is not relevant to the micro-founded gravity 1 Eq. (3) can be written in the level form as: K Gravity Model Selection in Seafood Trade 6 model (Disdier & Marette, 2010; Feenstra, 2004).
The Heckman estimation approach faces two essential problems.
Under such a situation, extensions of the PPML and NB models, Zero Inflated Poisson (ZIP) and Zero Inflated Negative Binomial (ZINB) models can be used to overcome the encountered problems.

Empirical Model Specification and Data Sources

In order to test the hypothesis that chemical standards act as barriers to international seafood trade, the authors first estimate the OLS gravity model suggested by Anderson and van Wincoop (2003) and the Heckman model in the log linear form of the dependent variable, bilateral trade.
The authors then estimate the gravity model in the level form using the Poisson family regressions: the PPML, NB, ZIP, and ZINB models.
Gravity Model Selection in Seafood Trade 15.

Estimated Results and Discussions

Table 1 shows the empirical results of the OLS and Heckman maximum likelihood models estimated in the log linear specification form.
The conditional marginal effect, and not the coefficient of the Heckman model, is comparable with the coefficient of the OLS model (Hoffmann & Kassouf, 2005).
With regards to the intensive margin of trade, conditioned on positive trade being observed, one unit reduction in chloramphenicol analytical limit (1 ppb) reduces bilateral seafood import 0.86% predicted by the OLS model and 0.84%predicted by the Heckman model.
The bilateral distance variable has a negative relationship with the probability of positive trade being observed.
Results of the Poisson family regressions are reported in Table 3.

Conclusions

The main objective of this investigation was to test if food safety standards act as barriers to international seafood trade.
The Gravity Equation in International Trade: Some Microeconomic Foundations and Emperical Evidence.

Did you find this useful? Give us your feedback

Figures (9)

Figure 2. Prototypes of the two different type of overlapping speech between system (–) and user (–) in HCI.

Table I. Mapping of SmartKom’s emotional categories to dominance and valence.

Figure 8. Valence change at the point where overlapping speech occurs in the SmartKom Corpus. The stars indicate a significant affective state change.

Figure 7. Dominance change at the point where overlapping speech occurs in the SmartKom Corpus. The stars indicate a significant affective state change.

Figure 1. Prototypes of the four different situations of two speakers (denoted as – and –) for overlapping speech in HHI.

Figure 4. Difference of the utterance length for the two types of overlapping and the average utterance length in SmartKom. The stars indicate a significant difference.

Figure 5. Dominance change at the point where overlapping speech occurs, stars indicate a significant affective state change

Figure 3. Difference of the utterance length for the four defined overlapping speech situations and the average utterance length in the Davero corpus, stars indicate a significant difference.

Figure 6. Valence change at the point where overlapping speech occurs in the Davero corpus.

Content maybe subject to copyright Report

Overlapping Speech, Utterance Duration and

Affective Content in HHI and HCI – an Comparison

Ingo Siegert, Ronald B

ock, Andreas Wendemuth

Cognitive Systems Group

Otto von Guericke University Magdeburg, Germany

{ﬁrstname.lastname}@ovgu.de

Bogdan Vlasenko

Idiap Research Institute

Martigny, Switzerland

bogdan.vlasenko@idiap.ch

Kerstin Ohnemus

davero Dialog Gruppe

91058 Erlangen, Germany

kerstin.ohnemus@davero.de

Abstract—In human conversation, turn-taking is a critical

issue. Especially if only the speech channel is available (e.g.

telephone), correct timing as well as affective and verbal signals

are required. In cases of failure, overlapping speech may occur

which is in the focus of this paper. We investigate the davero

corpus a large naturalistic spoken corpus of real callcenter

telephone conversations and compare our ﬁndings to results on

the well-known SmartKom corpus consisting of human-computer

interaction. We ﬁrst show that overlapping speech occurs in

different types of situational settings – extending the well-known

categories cooperative and competitive overlaps –, all of which

are frequent enough to be analyzed. Furthermore, we present

connections between the occurrence of overlapping speech and

the length of the previous utterance, and show that overlapping

speech occurs at dialog instances where certain affective states are

changing. Our results allow the prediction of forthcoming threat

of overlapping speech, and hence preventive measures, especially

in professional environments like call-centers with human or

automatic agents.

I. INTRODUCTION

Human communication consists of several information

layers, of which the factual layer is the most important. But

beyond the pure textual information other relevant information

such as affective state, self-revelation, and appeal are transmit-

ted [1]. These different pieces of information are provided to

support human conversations and to increase the likelihood of

a ﬂuent conversation.

One important requirement for a ﬂuent and successful

conversation is an efﬁcient turn-taking, which has to be orga-

nized by speciﬁc “underlying mechanisms”, such as intonation,

semantic cues, facial expressions, eye contact, breathing, and

gestures [2], [3], [4]. In the organization of turn-taking and

to evaluate the conversation, overlapping speech has a major

role. Based on the turn-taking model by Sacks et al. [3],

conversational partners aim to minimize overlaps and gaps in

their conversations. From to this model, overlaps occurs at

places of possible turn ends, either as “terminal overlaps” or

“simultaneous start”. Thus, overlapping speech is explained

as a result of turn-taking principles. This explanation is ex-

tended for different situational settings e.g. by [5], [6], where

short feedback signals conﬁrming the statement of the current

speaker, are seen as “response token” overlaps. Furthermore,

several studies also analyses competitive overlaps, in which

the conversational partners compete for the turn [7], [8].

Many recent studies analyzed the phonetic structure of

overlapping speech and found that fundamental frequency,

intensity, speech rate and rhythm are important features charac-

terizing the overlaps as either being cooperative or competitive

[9], [8], [10]. Most of these studies concentrate on local

analyses investigating the acoustic characteristics next to or

directly at the overlap. Only a few studies incorporate for

example information on the duration of turns [11]. But the

relation of the length of utterances with the situational type of

overlap is not analyzed.

Former studies on overlapping speech concentrate to seek

an explanation of how overlapping speech works. They do

not analyze which consequences lead to an overlap or which

consequences the overlap has for the progress of the interac-

tion. The analyses especially disregard the length of utterances

where an overlap occurs (consequence lead to an overlap) and

which inﬂuence the affective state could have. Especially, in

[12] it is emphasized that affective states inﬂuence the turn-

taking behavior. Thus, problems in turn-taking can also be

traced back to changes in the affective state.

Furthermore, these studies are conducted only on human-

human interaction (HHI). Investigations on human-computer

interaction (HCI) do not consider overlapping speech as an

informative signal so far. But, to reach the target of a more

naturalistic interaction, future systems have to be adaptable to

the users’ individual skills, preferences, and current emotional

state [13], [14]. Lot of progress have been made in the area

of affect detection from speech, facial expression and gesture

[15]. For a fully naturalistic HCI, it is necessary to capture as

many human abilities as possible. Thus also linguistic features

gain considerable importance [16]. In an earlier study, we

could show that discourse particles, exchanged among the

interaction partners and used to signalize the progress of the

dialogue, are also used in naturalistic HCI, although the system

was not able to properly react to them [17]. The usage of these

cues is inﬂuenced by the user’s age and gender [18].

In this paper we will extend our analysis of linguistic

cues to overlapping speech and analyze the meaningfulness

of overlapping speech regarding the utterance length and the

user’s affective state change. We conducted a contrasting

study using human-human interaction (HHI) as well as HCI.

This will be the ﬁrst step towards an automatic evaluation

of overlapping speech and could help future “Cognitive In-

focommunication” systems to understand the human better

[14]. Technical systems that use this extended recognition

of linguistic cues adapt to their users and thus become his

attendant and ultimately his companion [13], [19].

Based on these considerations, we investigate the following

three research questions in this paper.

Q1 Is overlapping speech occurring frequently enough in

our material to be analyzed in a dyadic conversation?

Q2 Is there any connection between overlapping speech

and the length of the previous utterance?

Q3 Is overlapping speech occurring at points where the

affective state is changing?

The remainder of the paper is structured as follows: In

Section II the utilized datasets are shortly described and

speciﬁc differences are emphasized. Afterwards, in Section III,

we describe the preparation of the date in terms of types of

overlap and affective annotation. In Section IV the results are

presented and discussed. Finally, in Section V, a conclusion of

our investigations and an outlook for further research is given.

II. UTILIZED DATASETS

A. Davero Corpus of Telephone-based Conversations

The dataset is described in detail in [20]. It is created

within a research project aiming to develop a technical system

that supports callcenter employees to respond appropriately to

the current affective state of the caller and was recorded in a

callcenter collecting real and authentic phone calls in German.

The calls embrace various topics, like information talks, data

change notiﬁcations, and complaints.

In order to allow a complete analysis of the conversation

both, agent and caller, were recorded acoustically. To gain real-

istic and high-quality recordings as well as to avoid disturbing

background noise, a separate recording place had been set up.

In total, 49 days ∗ 7 hours have been recorded. Since the

recorded phone conversations are real customer interactions

they had to be anonymized ﬁrst, blanking out all personal

information. Furthermore, the start and end-times of each

dialog and overlapping speech segments were marked and each

utterance was assigned to its corresponding speaker (agent or

caller). To date, this dataset contains 1,600 dialogs with 27,000

individual utterances. The dialogs have an average length of

about 5 minutes with a standard deviation of ± 2 minutes.

B. SmartKom multi-modal Corpus

The SmartKom multi-modal corpus contains naturalistic af-

fects within a HCI [21]. The system responses were generated

by a Wizard-of-Oz (WOZ) setup. For our evaluations we use

German dialogs concerning a technical scenario, recorded in

a public environment. The database contains multiple audio

channels and two video channels (face and body in proﬁle

posture). The primary aim of this corpus was the empirical

study of HCI in a number of different tasks. It is structured

into several sessions. Each session contains one conversation

and is approximately 4.5 minutes long.

This corpus has several annotation levels, of which for our

investigation the turn segmentation and an affective annotation

based on the acoustic channel is used [22]. The considered

set of the SmartKom corpus contains 438 emotionally labeled

dialogs with 12,076 utterances in total and 6,079 user utter-

ances. The utterances are labeled in seven broader affective

states: neutral, joy, anger, helplessness, pondering, surprise and

unidentiﬁable episodes. Unfortunately, the turn segmentation is

not time-aligned with the affective annotation.

III. PREPARATION OF DATASETS

A. Analysis of Overlapping Speech

We analyze overlapping speech as an additional pattern

of an interaction, as we assume that a valuable contribution

to the assessment of interactions is provided. Overlapping

speech refers to the case, when both speakers are talking

simultaneously.

1) Davero Corpus: By listening to examples of the Davero

corpus, four different situations (S) can be identiﬁed where

overlapping speech occurs:

S1 Short feedback, no interruption of the speaker

S2 Premature turn-taking at the end of the speaker’s turn

S3 Simultaneous starting after longer silence

S4 Barge-in, aiming to take the turn over

These situations are based on the descriptions of [11], distin-

guishing response tokens (S1), terminal overlaps (S2), simulta-

neous starts (S3) and competitive overlaps (S4). A prototypical

illustration is given in Figure 1.

In the ﬁrst situation (S1), the listener just wants to give a

feedback. Lacking of other feedback methods (head nodding,

eye gaze), the listener has to give the feedback acoustically.

Thus no real turn-taking occurs. The second situation (S2) can

be seen as a functional turn-taking. The listener knows that

the speaker’s turn is due to end, but because of the missing

visual feedback the listener starts his turn a bit too early. In

this case just the alignment of the turn-taking is incomplete.

S3 is similar, both speakers start talking coincidentally after

a longer silence due to missing cues. S4 shows an disturbed

turn-taking. It describes the case where one speaker barges-in

while the other is still speaking to deliberately steal the turn

from that other speaker.

Time

Figure 1. Prototypes of the four different situations of two speakers (denoted

as – and –) for overlapping speech in HHI.

To evaluate the overlapping speech according to these

descriptions, we employed two labelers with psychological

background for the assessment. They could choose between

all four situations or describe a situation not covered by

the deﬁnitions. Of the currently available 27,000 utterances

in 1,600 dialogs 5,100 utterances (18.9%, 830 dialogs) are

marked to contain overlapping speech.

The ﬁnal assessment is as follows: S1 has a share of

61.6%, S2 a share of 11.2%, S3 a share of 10.7%, and S4

a share of 16.6%. Furthermore, no additional situation was

selected by the annotators. As inter-rater reliability of the

crosstalk labelling we calculated a Krippendorff’s alpha of

0.63, a substantial reliability according to [23].

2) SmartKom Corpus: The SmartKom Corpus does not

have explicitly marked overlapping speech segments. By using

the segmentation annotation we could identify two types (T)

of overlapping speech (Figure 2):

T1 User interrupts the system.

T2 System interrupts the user.

Figure 2. Prototypes of the two different type of overlapping speech between

system (–) and user (–) in HCI.

In HCI the system is not seen as an equivalent dialog

partner [24], [25]. Therefore, the variety of types is not as big

as in HHI. From the 12,640 dialog acts within the SmartKom

corpus, we have 817 overlapping speech samples for T1 and

672 samples for T2.

B. Evaluation of the Affective States

We analyzed the affective states in both corpora based on

the Geneva Emotion Wheel by K. Scherer [26]. This is an

empirically tested instrument for the assessment of affective

states including 16 “emotional families”, which are arranged

on a circle along the axes dominance and valence.

1) Davero Corpus: To conduct the affective assessment,

we ﬁrst employed a few annotators to manually segment the

recordings into single dialogs including the speaker turns. We

asked four annotators, all of them with psychological back-

ground, to assess the affective content of the single utterances.

We conducted several training rounds to make them familiar

with the used annotation scheme and the affective assessment

of acoustic data [27]. To support the annotation process, the

program ikannotate was used [28], [29]. This tool supports the

annotators by employing a three-step annotation process:

1. The annotator decides if the dominance is high or low.

2. The annotator decides for positive or negative valence.

3. The resulting quadrant of the wheel is displayed with

the containing emotion families. The annotator selects

one family among them to indicate the perceived

emotion.

For the present investigation, we only consider the labels from

step 1 and step 2, as we are only interested in a general

affective change. For the inter-rater reliability of the affective

annotation we calculated a Krippendorff’s alpha of 0.20 for

dominance and of 0.35 for valence. Although these numbers

seem to be quite low compared to other reliability values

known from content analysis, they are in line with results from

other research groups on affective analyses [27].Considering

the annotation results we observe a nearly balanced distribution

among the utterances. High dominance has a share of 59.0%

and low dominance of 41.0%, positive valence has a share of

53.5% and negative valence of 46.5%.

2) SmartKom Corpus: The SmartKom Corpus already has

an affective annotation [30]. Unfortunately, this annotation is

not on the same time-scale as the dialog act segments. Thus,

we have to perform an alignment of both annotation levels,

by using the individual timing information of both annotation

levels. Unfortunately, the corpus authors only measured the

annotation correctness by comparing the results of different

annotation rounds rather than calculating an inter-rater agree-

ment measure like Krippendorff’s alpha or Fleiss’ kappa. Their

calculated correctness is 45.52% [31]

Furthermore, in contrast to the Davero corpus, the affective

annotation in SmartKom is based on emotional categories.

Thus, we ﬁrst have to deploy a mapping of the categories

used in SmartKom to our utilized dimensional categories of

dominance and valence. To conduct this mapping, we rely

on the Geneva Emotion Wheel [26], as it is also used for

the annotation of the Davero corpus. In analogy to similar

mappings [32], the assignment of the emotional categories to

the valence-dominance space is given in Table I. In contrast to

the Davero corpus, neutral is used as a category in SmartKom.

Table I. Mapping of SmartKom’s emotional categories to dominance and

valence.

category dominance valence

neutral 0 0

joy +1 +1

anger +1 -1

helplessness -1 -1

pondering -1 +1

surprise -1 +1

unidentiﬁable 0 0

In total we have 14,298 affective segments. Most of these

segments (59.7%) are neutral. The distribution on the dom-

inance dimension is 30.7% low and 69.3% high dominance.

Positive valence has a share of 39.0% positive and negative

valence a share of 61%. Thus, the emotional content within this

corpus is shifted towards negative valence and high dominance.

IV. RESULTS

A. Q1: Occurrence of Overlapping Speech

To answer the ﬁrst question, we calculated the ratio of

overlapping speech segments and number of utterances. For the

Davero corpus we have 27,000 utterances and 5,100 of them

contain overlapping speech. Thus, we have a share of 18.9%

for overlapping speech segments. If we consider the dialog

level, we have 1,600 dialogs in total of which 830 dialogs

contain overlapping speech segments. This results in a share

of 51.9%.

The German part of the SmartKom Corpus has a total num-

ber of 12,076 utterances with 1,489 occurrences of overlapping

speech. This results in a share of 12.3% overlapping speech

utterances for this HCI. As we only take into account the

overlapping speech, we have 6,347 user utterances and 817

utterances contain overlapping speech.

Thus, we can conclude that overlapping speech is occur-

ring frequently and the ﬁrst question: “Is overlapping speech

occurring frequently enough to be analyzed in a dyadic con-

versation?” is approved.

B. Q2: Overlapping Speech and Utterance Lengths

This investigation is triggered by the assumption that

overlapping speech is occurring because the actual speaker is

talking too long and the listener wants to get the turn. To

investigate this assumption, we calculated the mean length of

the utterance where the overlapping speech occurs (utt

ov erlap

)

in relation to all other utterances (utt

remain

) of this speaker

within a dialog. Afterwards, we averaged over all dialogs and

calculated the difference between both averaged mean lengths:

∆len = len

utt

overlap

− len

utt

remain

(1)

This calculation is performed for each of the previous iden-

tiﬁed different situations separately and averaged afterwards

(∆len). Additionally, we used the non-parametric Mann-

Whitney-U-Test, to test the signiﬁcance of the difference

within the utterance lengths. The star denotes the signiﬁcance

level: ** p < 0.001.

1) Davero Corpus: From Figure 3 it can be seen, that in

two situations, the length of the utterance with overlapping

speech is different from other utterances of the same speaker.

For S1, the len

utt

overlap

is signiﬁcantly longer than for other

utterances. In this situation one speaker gives statements that

are just conﬁrmed by the listener without interrupting the

speaker. Thus, the speaker can continue his turn. The presence

of this type of overlapping speech does not indicate a change

in the progress of the dialog. The same statement can be made

for S2, where the len

utt

overlap

is signiﬁcantly shorter than for

other utterances. The overlapping speech in both situations is

just occurring because only the acoustic channel can be used

to negotiate the turn-taking. Thus, the length of an utterance

together with the information of an occurring speech overlap

cannot be used as an indicator for dysfunctional conversation.

S1 S2 S3 S4

−2

** **

Overlapping Speech Situation

∆len [s]

Figure 3. Difference of the utterance length for the four deﬁned overlapping

speech situations and the average utterance length in the Davero corpus, stars

indicate a signiﬁcant difference.

2) SmartKom Corpus: From Figure 4 it can be seen that

for type T1 the system utterance with the overlapping speech

segment (len

utt

overlap

) is signiﬁcantly longer than the other

system utterances. Thus, it can be assumed that for a pos-

itive interaction outcome and ﬂuent conversation the system

prompts shouldn’t be too long.

For the second type, where the system interrupts the users,

the users’ utterance length containing overlapping speech is not

signiﬁcantly different from other user’s utterances. Therefore,

we assume that these interruptions of the system are caused

by operator errors of the WOZ-system.

Regarding our second research question, we can state that

there is a signiﬁcant correlation between overlapping speech

and the length of the previous utterance in both HHI and HCI.

C. Q3: Overlapping Speech and Affective Changes

To investigate the affective change at the point where

overlapping speech occurs, we take into account the observed

affective states in two preceding utterances and compare it to

T1 T2

Type of Overlapping Speech

∆len [s]

Figure 4. Difference of the utterance length for the two types of overlapping

and the average utterance length in SmartKom. The stars indicate a signiﬁcant

difference.

the observed affective states in the two succeeding utterances.

We distinguish between high (+1) and low (-1) dominance and

positive (+1) and negative (-1) valence. Utterances that do not

have an affective label or are labeled as neutral are assigned a

0. Thus, we can calculate the difference between the affective

states of the preceding utterances and the succeeding utterances

for an overlapping speech segment:

∆Aﬀect = Aﬀect

before overlap

− Aﬀect

after overlap

(2)

Afterwards, we average over all segments (∆Aﬀect). The sig-

niﬁcance of the affective change is tested by using the Mann-

Whitney-U-Test. The stars denotes the signiﬁcance level: *

p < 0.01 and ** p < 0.001.

S1 S2 S3 S4

0.5

Overlapping Speech Situation

∆Dom

Figure 5. Dominance change at the point where overlapping speech occurs,

stars indicate a signiﬁcant affective state change

1) Davero Corpus: Analyzing the change of affective states

in connection with overlapping speech only in S3 (simulta-

neous starting after longer silence) a signiﬁcant change in

the affective state can be observed, see Figure 5. A possible

interpretation for this observation is that the dominance level

is dropping. Having a deeper analysis of the data, we can state

that the dominance level of the interrupter is raising, while the

dominance of the speaker whose turn is interrupted is slightly

decreasing. In this case, the overlapping speech event could be

a good marker for identifying changes in dominance. For all

other situations of overlapping speech, the dominance of the

two speakers is not inﬂuenced by overlapping speech.

S1 S2 S3 S4

−0.2

0.2

Overlapping Speech Situation

∆Val

Figure 6. Valence change at the point where overlapping speech occurs in

the Davero corpus.

For the change of the speaker’s valence, we can state

that there is no signiﬁcant connection with the occurrence of

overlapping speech, cf. Figure 6. This could be expected as

overlapping speech is related to the turn-taking behavior of

the speakers and the dominance of a speaker is seen as the

underlying mechanism to regulate the turn-taking [12].

T1 T2

−1

−0.5

** **

Type of Overlapping Speech

∆Dom

Figure 7. Dominance change at the point where overlapping speech occurs in

the SmartKom Corpus. The stars indicate a signiﬁcant affective state change.

2) SmartKom Corpus: Regarding Figure 7, it can be seen

that the dominance for both types is signiﬁcantly higher after

the overlapping speech segment than before. This it quite

obvious for the case where the user actively interrupts the

system, but when the system interrupts the user this seems

quite unintuitive and can only be explained in connection

with the valence change. Regarding the valence change (cf.

Figure 8), we can state that after the overlapping speech

the user is signiﬁcantly more moved to negative values. This

ﬁnding in connection with a higher dominance shows that it

can be assumed that the user is more angry after overlapping

speech, either because he wants to speak and interrupts the

system, or he is annoyed because the system interrupts him.

T1 T2

0.5

** *

Type of Overlapping Speech

∆Val

Figure 8. Valence change at the point where overlapping speech occurs in

the SmartKom Corpus. The stars indicate a signiﬁcant affective state change.

Regarding our third research question: “Is overlapping

speech occurring at points where the affective state is chang-

ing?”, we can conclude that in HHI only changes in the

dominance are related to overlapping speech, whereas in HCI

signiﬁcant changes in both affective dimensions, dominance

and valence, can be observed.

V. CONCLUSION

In this paper, we present a ﬁrst study investigating over-

lapping speech effects in both HHI and HCI. The analyses are

conducted on a dataset of realistic HHI containing telephone

based conversations and the well-known SmartKom Corpus

of naturalistic HCI. We could show that in both datasets

overlapping speech occurs frequently enough to be analyzed

with a share of 18.9% for HHI and 12.3% for HCI. For the

investigated HHI, this share is in-line with the results of other

research groups [11], [33]. The amount of overlap in HCI is

a bit lower but still sufﬁcient. For this no numbers of other

researchers are to our best knowledge reported in the literature.

Based on the description of situational settings, we ﬁrst an-

alyzed the correlation between the length of overlap-preceding

utterances and the occurrence of overlap. As a result of our

ﬁrst analysis, we could expose signiﬁcant relations to the

length of the spoken utterances and changes in the affective

state of the conversational partners. In HHI we could ﬁnd a

signiﬁcant correlation between overlapping speech as feedback

and premature turn-taking. Also in HCI a signiﬁcant corre-

lation is found between overlapping speech and the length

of system utterances. The user’s utterance-lengths did not

show signiﬁcant correlations for the occurrence of overlaps,

we assume that these overlaps are just caused by operator

malfuntions, as the SmartKom data are recorded in a WOZ-

scenario. And now pre-deﬁned design rules are given for the

wizards how to use overlap [21].

Secondly, we analyzed the correlation of affective changes

at in the surrounding of the overlap in both types of inter-

actions. For this investigation, we showed that overlapping

speech goes along with changes in the affective states of

dominance and valence in certain situations. In HHI only

the situation where both speakers start simultaneously after a

longer pause effects a signiﬁcant change of dominance. For the

valence dimension no signiﬁcant correlation could be found. In

the investigated HCI both affective dimensions, dominance and

valence, show a signiﬁcant correlation to overlapping speech

in both situation types.

From these results, we are able to derive some rules for

the organization of interactions: In telephone based HHI the

utterances should not be too long and the listener should be

encouraged to give feedback. This avoids competitive barge-

in overlaps. For HCI, the system should not talk to long as

for all overlapping speech segments, an affective change to

higher dominance and negative valence of the speaker can be

observed. But this kind of affective change should be avoided.

To evaluate these statements for their generality, a broader

investigation including additional corpora has to be conducted.

A possible application of our investigations in HHI and

HCI is the identiﬁcation of parts where the affective state

changes based on the knowledge of overlapping speech and the

dialog course: As e.g. situation S3, where both speakers start

simultaneously, can be easily identiﬁed by duration analysis,

this knowledge can be used to ﬁnd affective material for further

emotional analyses.

In our further research activities, we will develop a robust

automatic identiﬁcation of the different types of overlap.

Together with the recognition of the user’s affective state,

we are a step further to future Cognitive Infocommunication

systems acting as a companion towards human users [13], [14].

VI. ACKNOWLEDGEMENTS

The work presented in this paper was done within the

Transregional Collaborative Research Centre SFB/TRR 62

‘Companion-Technology for Cognitive Technical Systems’

funded by the German Research Foundation (DFG).

REFERENCES

[1] F. Schulz von Thun, Miteinander reden 1 - St

orungen und Kl

arungen.

Reinbek, Germany: Rowohlt, 1981.

[2] R. Ishii, K. Otsuka, S. Kumano, and J. Yamato, “Analysis of respiration

for prediction of ”who will be next speaker and when?” in multi-party

meetings,” in Proc. of the 16th International Conference on Multimodal

Interaction, ser. ICMI ’14, Istanbul, Turkey, 2014, pp. 18–25.

[3] H. Sacks, E. A. Schegloff, and G. Jefferson, “A simplest systematics

for the organization of turn taking for conversation,” Language, vol. 50,

pp. 696–735, 1974.

HTML Viewer

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Overlapping speech, utterance duration and affective content in hhi and hci – an comparison" ?

In cases of failure, overlapping speech may occur which is in the focus of this paper. The authors investigate the davero corpus a large naturalistic spoken corpus of real callcenter telephone conversations and compare their findings to results on the well-known SmartKom corpus consisting of human-computer interaction. The authors first show that overlapping speech occurs in different types of situational settings – extending the well-known categories cooperative and competitive overlaps –, all of which are frequent enough to be analyzed. Furthermore, the authors present connections between the occurrence of overlapping speech and the length of the previous utterance, and show that overlapping speech occurs at dialog instances where certain affective states are changing.

Q2. What have the authors stated for future works in "Overlapping speech, utterance duration and affective content in hhi and hci – an comparison" ?

In their further research activities, the authors will develop a robust automatic identification of the different types of overlap. Together with the recognition of the user ’ s affective state, the authors are a step further to future Cognitive Infocommunication systems acting as a companion towards human users [ 13 ], [ 14 ].

Q3. What test was used to test the significance of the difference between the two utterance lengths?

the authors used the non-parametric MannWhitney-U-Test, to test the significance of the difference within the utterance lengths.

Q4. How many utterances are in the corpus?

The considered set of the SmartKom corpus contains 438 emotionally labeled dialogs with 12,076 utterances in total and 6,079 user utterances.

Q5. How many utterances are marked to contain overlapping speech?

Of the currently available 27,000 utterances in 1,600 dialogs 5,100 utterances (18.9%, 830 dialogs) are marked to contain overlapping speech.

Q6. What is the effect of overlapping speech?

For this investigation, the authors showed that overlapping speech goes along with changes in the affective states of dominance and valence in certain situations.

Q7. What annotation level is used for the Davero corpus?

This corpus has several annotation levels, of which for their investigation the turn segmentation and an affective annotation based on the acoustic channel is used [22].

Q8. What is the possible application of their investigations in HHI and HCI?

A possible application of their investigations in HHI and HCI is the identification of parts where the affective state changes based on the knowledge of overlapping speech and the dialog course:

Q9. How did the corpus authors measure the annotation correctness?

the corpus authors only measured the annotation correctness by comparing the results of different annotation rounds rather than calculating an inter-rater agreement measure like Krippendorff’s alpha or Fleiss’ kappa.

Q10. How many overlapping speech segments do the authors have?

As the authors only take into account the overlapping speech, the authors have 6,347 user utterances and 817 utterances contain overlapping speech.

Q11. How many annotators did the authors employ to conduct the affective assessment?

To conduct the affective assessment, the authors first employed a few annotators to manually segment the recordings into single dialogs including the speaker turns.

Overlapping speech, utterance duration and affective content in HHI and HCI — An comparison

Summary (2 min read)

Introduction

Conventional OLS and Zero-Accounting Models of the Gravity Equation

Empirical Model Specification and Data Sources

Estimated Results and Discussions

Conclusions

Figures (9)

Citations

Cites background from "Overlapping speech, utterance durat..."

References

Additional excerpts

"Overlapping speech, utterance durat..." refers methods in this paper

"Overlapping speech, utterance durat..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Overlapping speech, utterance duration and affective content in hhi and hci – an comparison" ?

Q2. What have the authors stated for future works in "Overlapping speech, utterance duration and affective content in hhi and hci – an comparison" ?

Q3. What test was used to test the significance of the difference between the two utterance lengths?

Q4. How many utterances are in the corpus?

Q5. How many utterances are marked to contain overlapping speech?

Q6. What is the effect of overlapping speech?

Q7. What annotation level is used for the Davero corpus?

Q8. What is the possible application of their investigations in HHI and HCI?

Q9. How did the corpus authors measure the annotation correctness?

Q10. How many overlapping speech segments do the authors have?

Q11. How many annotators did the authors employ to conduct the affective assessment?