scispace - formally typeset
Open AccessJournal ArticleDOI

Cost-effective solution to synchronised audio-visual data capture using multiple sensors

Reads0
Chats0
TLDR
This work centralises the synchronisation task by recording all trigger- or timestamp signals with a multi-channel audio interface, and shows that a consumer PC can currently capture 8-bit video data with 1024x1024 spatial- and 59.1Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96kHz.
About
This article is published in Image and Vision Computing.The article was published on 2011-09-01 and is currently open access. It has received 32 citations till now. The article focuses on the topics: Timestamp & Sensor fusion.

read more

Figures
Citations
More filters
Journal ArticleDOI

A Multimodal Database for Affect Recognition and Implicit Tagging

TL;DR: Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
Proceedings ArticleDOI

Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions

TL;DR: A new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA is presented, which is being made available to the research community to take self-report measures of users during task completion.
Journal ArticleDOI

The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent

TL;DR: A large audiovisual database is created as a part of an iterative approach to building Sensitive Artificial Listener agents that can engage a person in a sustained, emotionally colored conversation.
Journal ArticleDOI

A Review of Human Activity Recognition Methods

TL;DR: This work proposes a categorization of human activity methodologies and divides human activity classification methods into two large categories according to whether they use data from different modalities or not, and examines the requirements for an ideal human activity recognition dataset.
Proceedings ArticleDOI

The SEMAINE corpus of emotionally coloured character interactions

TL;DR: A new corpus of emotionally coloured conversations recorded while holding conversations with an operator who adopts in sequence four roles designed to evoke emotional reactions is made available to the scientific community through a web-accessible database.
References
More filters
Journal ArticleDOI

High-quality video view interpolation using a layered representation

TL;DR: This paper shows how high-quality video-based rendering of dynamic scenes can be accomplished using multiple synchronized video streams combined with novel image-based modeling and rendering algorithms, and develops a novel temporal two-layer compressed representation that handles matting.
Journal ArticleDOI

Optical properties of human skin, subcutaneous and mucous tissues in the wavelength range from 400 to 2000 nm

TL;DR: In this article, the optical properties of human skin, subcutaneous adipose tissue and human mucosa were measured in the wavelength range 400-2000 nm using a commercially available spectrophotometer with an integrating sphere.
Journal ArticleDOI

High performance imaging using large camera arrays

TL;DR: A unique array of 100 custom video cameras that are built are described, and their experiences using this array in a range of imaging applications are summarized.
Related Papers (5)
Frequently Asked Questions (22)
Q1. What have the authors contributed in "Cost-effective solution to synchronised audio-visual data capture using multiple sensors" ?

Furthermore, the authors show that a consumer PC can currently capture 8-bit video data with 1024x1024 spatialand 59. The authors thus improve the quality/cost ratio of multi-sensor systems data capture systems. 

If a monochrome camera is used, a monochrome colour source can improve image sharpness with lowcost lenses, by preventing chromatic abberation. 

Since many audio processing methods are vulnerable to noise, the microphone setup is an important factor for accurate multimodal data capture. 

Any type of sensor can be synchronised with the audio data, as long as it produces a measurable signal at the data capture moment, and its output data include reliable sample counts or timestamps relative to the first sample. 

For computer vision applications involving moving objects, such as human beings or parts of the human body, progressive scan global shutter sensors are the primary choice. 

The timestamp signals from multiple PCs can be recorded as separate channels in a multi-channel audio interface, making use of the hardware-synchronisation between the different audio channels. 

When the trigger output of themaster camera is used as the input to the slave cameras, the resulting delay of the slave cameras is approximately 30µs. 

Due to irregularities in sensor production, or the influence of radiation, some sensor locations have a defect that causes their pixel read-out values to be significantly higher (hot) or lower (cold) than the correct measurements. 

Capture software running on different PCs can be synchronised by letting each PC transmit its CPU cycle count as timestamp signals outputted by the serial port. 

In their recordings, the authors used the MOTU8pre at 48kHz sampling rate and the authors configured the serial port to transmit at 9600 bits per second (bps). 

to prevent the communication to the PCI graphics card from reducing the storage WTR, the authors had to disable displaying the live video. 

Because of the shortcomings and high costs of commercially available video capture systems, many researchers have already sought custom solutions that meet their own requirements. 

The problem of the high cost of custom solutions and specialised professional hardware is that it keeps accurately synchronised multi-sensor data capture out of reach for most computer vision and pattern recognition researchers. 

They used a tree of trigger connections between the processing boards (that each control one camera) to synchronise the cameras with a difference of 200 nanoseconds between subsequent levels of the tree. 

Using a photo diode that is sensitive to IR, the authors could record these flashes as a sensor trigger signal in one of the audio channels and estimate the accuracy of synchronisation of the gaze data. 

To overcome this, the authors propose solutions and present findings regarding the two most important difficulties in using low-cost Commercial Off-The-Shelf (COTS) components: reaching the required bandwidth for data capture and achieving accurate multi-sensor synchronisation. 

the authors could find a linear mapping between audio sample number and the time of the external system, by applying a linear fit on all two-dimensional time synchronisation points (timestamps with corresponding audio time) that are received during a recording. 

These signals can be recorded in a parallel audio channel as well, and can even be used as a common time base to synchronise multiple asynchronous audio interfaces. 

The maximum number of cameras that can be connected to one FireWire bus is typically limited to 4 or 8 (DMA channels), depending on the bus hardware. 

Assuming that the process of transmission and reception are symmetric, the transmission latency can be found as half of the time needed for transmitting and receiving the timestamp signal, compensated by the duration of the signal. 

The above-discussed experiments show that synchronisation by transmitting timestamp signals through the serial port, can be done with an accuracy of approximately 20µs. 

This means that, with an audio sampling rate of 48kHz, the uncertainty of localising the rising camera trigger edge is around 20µs.