HRILab Tufts - The Reliability of Non-verbal Cues for Situated Reference Resolution and their Interplay with Language

Stephanie Gross and Brigitte Krenn and Matthias Scheutz

When uttering referring expressions in situated task descriptions, humans naturally use verbal and non-verbal channels to transmit information to their interlocutor. To develop mechanisms for robot architectures capable of resolving ob- ject references in such interaction contexts, we need to better understand the multi-modality of human situated task de- scriptions. In current computational models, mainly pointing gestures, eye gaze, and objects in the visual field are included as non-verbal cues, if any. We analyse reference resolution to objects in an object manipulation task and find that only up to 50% of all referring expressions to objects can be resolved including language, eye gaze and pointing gestures. Thus, we extract other non-verbal cues necessary for refer- ence resolution to objects, investigate the reliability of the different verbal and non-verbal cues, and formulate lessons for the design of a robot's natural language understanding capabilities.

@inproceedings{grossetal17icmi,
  title={The Reliability of Non-verbal Cues for Situated Reference Resolution and their Interplay with Language - Implications for Human Robot Interaction},
  author={Stephanie Gross and Brigitte Krenn and Matthias Scheutz},
  year={2017},
  booktitle={19th ACM International Conference on Multimodal Interaction},
  url={https://hrilab.tufts.edu/publications/grossetal17icmi.pdf}
}

The Reliability of Non-verbal Cues for Situated Reference Resolution and their Interplay with Language - Implications for Human Robot Interaction

© 2025 HRI Lab