Inferring colocation and conversation networks from privacy-sensitive audio with implications for computational social science
Summary (4 min read)
1. INTRODUCTION
- The automated recording of real-world speech is crucial because, despite the rise in on-line interactions, face-to-face communication is still people’s primary mode of social interaction [Baym et al. 2004].
- This requirement gives rise to two problems.
- Ideally, a privacysensitive recording technique will process incoming audio in order to discard any information deemed too invasive while still preserving data useful for sociological inquiry.
2. PRIVACY-SENSITIVE CONVERSATION MODELING
- When collecting situated conversation data it is necessary to protect the privacy of not just people who willingly consent to wear a recording device, but also of those who may come within range of the microphones.
- For this purpose, destructive processing of the audio should yield a feature set that prevents us from reconstructing intelligible speech or inferring the identities of anyone not wearing a device.
- At the same time, the features must still contain enough information to allow conversations to be found and meaningful inferences made about those ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 1, Article 7, Pub. date: January 2011. conversations.
- Energy can reveal a person or group’s interest in the conversation [GaticaPerez et al. 2005].
- The method proposed by Corman and Scott [1994] computes normalized cross-correlation between raw audio signals and concludes that two people are in a conversation if their correlation coefficients are above a threshold estimated from labeled data.
2.1 Privacy-Sensitive Features
- Following Basu [2002], their approach to extracting non-linguistic speech information builds on methods for detecting voiced human speech.
- In a spectrogram, time runs along the x-axis and frequencies increase along the y-axis; color indicates energy at a given frequency.
- The harmonicity in the spectrogram shows that voiced speech has a low spectral entropy, compared to non-voiced regions.
- Narrow spectrum noise can also create strong autocorrelation peaks.
- The precise procedure for computing features is as follows:.
2.2 Extracting Conversation Data
- To gather data about face-to-face conversations, the authors ask multiple people to wear recording devices each of which saves separate streams of the privacy-sensitive features described above.
- Finally, once colocated groups and speakers have been identified, the authors can conclude that people who are colocated and speaking are in conversation together and then extract further features of their conversation (Section 2.3).
- For each recorded stream, the authors use the forward-backward algorithm [Rabiner 1989] to infer p(V ta|xa): the posterior probability of voiced speech in each frame, given the entire recorded stream.
- Successful colocation detection requires clustering together segments of data from miked individuals when they are in a conversation.
- There is a sharp peak of high mutual information values ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 1, Article 7, Pub. date: January 2011.
2.3 Conversation Data
- The steps described so far provide ways of determining who is physically colocated with whom and who is speaking when, but they do not provide a method for determining who is in conversation with whom.
- Thus, an evaluation comparing the resulting inferred conversations to the “in conversation with” ground truth label yields exactly the same results as comparing their voicing-based colocation inference to the “in conversation with” ground truth.
- The table of those results is identical to Table II, and thus the authors omit it for space.
3. THE SPOKEN NETWORKS CORPUS
- Using the conversation detection methods from the previous section, the authors collected a corpus of real-world face-to-face conversations among 24 research subjects over an extended period of time.
- This section first contrasts their effort with earlier data collection projects (Section 3.1), it then explains the procedure used to gather the data (Section 3.2), provides summary statistics about the data (Section 3.3), and shows novel measures of social behavior that can be easily extracted form the data (Section 3.4).
3.2 Data Collection Method
- The data collection effort presented in this work descends from the original sociometer study, but differs in the research context and design.
- The subjects recorded data everywhere they went, indoors and outdoors: class, lunch, study groups, meetings, spontaneous social gatherings, etc. Data was saved to a 2-GB Secure Digital (SD) flash memory card on the PDA.
- Research subjects completed questionnaires before beginning the school year, at the end of each data collection week, and following the end of the school year.
- All conversation data discussed here was collected using the same platform: an HP iPAQ hx4700 PDA with an attached multi-sensor board containing eight different sensors.
- Because all of the PDA’s software and settings are stored in volatile RAM and are completely lost if the battery fully discharges, this led to many Monday mornings of lost recording time while PDAs were reconfigured.
3.3 Collected Data
- Figure 8 shows beanplots [Kampstra 2008] of the average number of hours collected per day for each collection week.
- The first three weeks (i.e., representing the first three months of the academic year) show an increase in the amount of data collected as the subjects initially learned how to use the devices and the authors resolved battery and software problems as previously described.
- Recording hours diminished slightly in the later weeks, also due partly to technical problems with the cables and perhaps because the participants became fatigued or the study became less novel to them.
- While there is no moment when all subjects are recording (the maximum number of simultaneous recordings is 21), there is enough overlap in the data for it to contain many interactions among their subjects.
3.4 Basic Behavioral Inferences
- Data processing follows the three steps described in Section 2: colocation detection, speaker segmentation, and conversation extraction.
- Recall from ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 1, Article 7, Pub. date: January 2011.
- During this academic term, most subjects attended a class that met from 10:30 am to 12:00 pm on these days, so many students arrived at school and began recording before that class.
- Figure 11 shows the inferences for colocation using both energy and voicing mutual information, as well as the conversation grouping.
- Because of that, when few people are recording even a small group of interacting subjects will appear as a larger proportion in the plot.
3.5 Basic Network Analyses
- Constructing networks from survey data is usually simple: they are often just the union of self-reported ties for each actor in the network.
- As with the edge value distributions, the values for the colocation degrees are much higher than those for conversation degrees and the two kinds networks seem to be very different with regard to degree.
- As with the clustering coefficient, the triangle count does not generalize as easily to weighted networks as degree and density, but, following Saramäki et al. [2007], the authors can define a weighted triangle value as Tijk = (YijYikY jk)1/3 (19).
- That difference may explain the bimodal colocation degree distributions of weeks 4 through 7 ), where there seems to be a distinction between pairs who spend much time together and pairs that only come together in passing.
4. CONCLUSION
- The authors have outlined a set of privacy-sensitive features that can be computed from incoming audio data in real-time.
- The authors have shown how to use those features to determine who was physically colocated with whom, both at the granularity of a room in a building and at the more elastic “acoustic proximity” needed to have a face-to-face conversation.
- The authors have used the privacy sensitive features to infer who was speaking when, and combined those inferences with colocation inference to determine who was in conversation with whom.
- This conversation detection can handle conversations with any number of participants, extending beyond previous methods that were limited to dyadic conversations only.
- The authors constructed weighted networks of social behavior and examined basic descriptive statistics in order to compare social networks defined by colocation events to networks defined by conversation events.
Did you find this useful? Give us your feedback
Citations
237 citations
Cites background from "Inferring colocation and conversati..."
...Some recent examples include automatically inferring co-location and conversational networks [14], linking social diversity and economic progress [6], automatic activity and event classification for...
[...]
211 citations
177 citations
Cites background from "Inferring colocation and conversati..."
..., spectral content and regularity, loudness) that are useful for detecting the presence of human voice but insufficient to reconstruct speech content.(16) Using these privacy-sensitive audio features and probabilistic inference techniques, it is possible to reliably estimate the number of conversations an individual engages in, the duration of the conversations, and how much time a given individual speaks within a conversation along with speaking rate and variations in pitch....
[...]
...Using these privacy-sensitive audio features and probabilistic inference techniques, it is possible to reliably estimate the number of conversations an individual engages in, the duration of the conversations, and how much time a given individual speaks within a conversation along with speaking rate and variations in pitch.(16) These features have been used to detect social isolation in older adults....
[...]
134 citations
Cites background from "Inferring colocation and conversati..."
...The other main application of audio analysis is generating conversational networks from raw audio data [367, 368]....
[...]
123 citations
References
39,297 citations
21,819 citations
2,959 citations
1,713 citations
1,676 citations