The acoustics of eye contact - Detecting visual attention from conversational audio cues

Publication from Digital

Eyben, F. and Weninger, F. and Schuller, B. and Paletta, L.

Proc. 6th Workshop on Eye Gaze in Intelligent Human Machine Interaction, (GAZE-IN 2013), held in conjunction with the ACM ICMI 2013, Sydney, Australia, 13 December, 2013 , 1/2013


An important aspect in short dialogues is attention as is manifested by eye-contact between subjects. In this study we provide a rst analysis whether such visual attention is evident in the acoustic properties of a speaker's voice. We thereby introduce the multi-modal GRAS2 corpus, which was recorded for analyzing attention in human-to-human interactions of short daily-life interactions with strangers in public places in Graz, Austria. Recordings of four test subjects equipped with eye tracking glasses, three audio recording devices, and motion sensors are contained in the corpus. We describe how we robustly identify speech segments from the subjects and other people in an unsupervised manner from multi-channel recordings. We then discuss correlations between the acoustics of the voice in these segments and the point of visual attention of the subjects. A signicant relation between the acoustic features and the distance between the point of view and the eye region of the dialogue partner is found. Further,
 we show that automatic classication of binary decision eye-contact vs. no eye-contact from acoustic features alone is feasible with an Unweighted Average Recall of up to 70%.