Q2. What have the authors stated for future works in "Pauses, gaps and overlaps in conversations" ?
Furthermore, as more than 40 % of all between-speaker intervals are long enough for the next speaker to react to information immediately before the silence given minimal response times for spoken utterances, the authors also conclude that reaction is a plausible explanation in a significant proportion of all speaker changes.
Q3. What is the common between-speaker interval in all three examined corpora?
The most common between-speaker interval in all three examined corpora, as indicated by the modes of the distribution functions, is a gap of about 200 ms.
Q4. What is the plausible goal for between-speaker intervals?
Assuming instead that we, as highly trained speakers, succeed more often than the authors fail at turntaking, slight gaps is a more plausible goal for between-speaker intervals.
Q5. How can the authors quantify the proportion of speaker changes where the gap is long enough for the next speaker?
By relating distributions of between-speaker intervals to minimal response times for spoken utterances, the authors can quantify the proportion of speaker changes where the gap is long enough for the next speaker to react the to the offset of speech, to silence or to some prosodic information immediately before the silence.
Q6. What is the general recommendation for the analysis of gap and overlap durations?
As a general recommendation, the authors suggest that whenever gap as well as overlap durations are available, they should be treated as one distribution, and that no transformation should be applied.
Q7. How did the authors determine the duration of the pauses, gaps and overlaps?
Once the pauses, gaps and overlaps were identified and classified, their durations were extracted by subtracting the time of the onset of an interval from the time of its offset.
Q8. How many pauses and gaps were detected in the Swedish Map Task Corpus?
an examination of the proportion of pauses and gaps with durations of more than 500 ms, a common silence threshold in end-of-utterance detectors, showed that such a threshold captured 51.1% and 47.5% of all gaps, but also 59.6% and 56.0% of all pauses in the Swedish Map Task Corpus and the HCRC Map Task Corpus, respectively.
Q9. Why did the authors choose not to subdivide the dataset?
While the dataset allows for analyses of differences between, for example, eye contact vs. no eyecontact conditions or gender differences, the authors chose not to subdivide the dataset to make such comparisons.
Q10. What was the definition of speaker changes?
There is also the possibility of speaker changes involving overlaps or no-gap–no-overlaps, which were the terms used by Sacks et al. (1974).
Q11. How many states can be augmented to model other subclassifications?
The number of states in such an interaction FSA may be augmented to model other subclassifications, or to model sojourn times, without loss of generality; here, the authors limit ourselves to an FSA of 10 states, and specifically to the 4 phenomena mentioned, as it is most directly relevant to their ongoing work in conversational spoken dialogue systems.