The Reliability of Non-verbal Cues for Situated Reference Resolution and their Interplay with Language - Implications for Human Robot Interaction

2017

Conference: 19th ACM International Conference on Multimodal Interaction

Stephanie Gross and Brigitte Krenn and Matthias Scheutz

When uttering referring expressions in situated task descriptions, humans naturally use verbal and non-verbal channels to transmit information to their interlocutor. To develop mechanisms for robot architectures capable of resolving ob- ject references in such interaction contexts, we need to better understand the multi-modality of human situated task de- scriptions. In current computational models, mainly pointing gestures, eye gaze, and objects in the visual field are included as non-verbal cues, if any. We analyse reference resolution to objects in an object manipulation task and find that only up to 50% of all referring expressions to objects can be resolved including language, eye gaze and pointing gestures. Thus, we extract other non-verbal cues necessary for refer- ence resolution to objects, investigate the reliability of the different verbal and non-verbal cues, and formulate lessons for the design of a robot's natural language understanding capabilities.

@inproceedings{grossetal17icmi,
  title={The Reliability of Non-verbal Cues for Situated Reference Resolution and their Interplay with Language - Implications for Human Robot Interaction},
  author={Stephanie Gross and Brigitte Krenn and Matthias Scheutz},
  year={2017},
  booktitle={19th ACM International Conference on Multimodal Interaction},
  url={https://hrilab.tufts.edu/publications/grossetal17icmi.pdf}
}