LLMs are being increasingly integrated into embodied robotic systems. We present a suite of evaluation metrics together with data augmentation techniques for evaluating robotic architectures with LLMs, using concepts from the cognitive science and human communication literature. The proposed evaluation metrics together with the characterization of different LLM integration approaches offer the promise of systematically evaluating LLMs as natural language interfaces to robotic systems as well as tackle the important tradeoff between explainability/verifiability/interpretability and robustness to noisy input and broad language understanding in an open-world embodied setting.
@article{sarathyetal25acmtist, title={On Evaluating LLM Integration into Robotic Architectures}, author={Vasanth Sarathy and Marlow Fawn and Matthew McWilliams and Bradley Oosterveld and Matthias Scheutz}, year={2025}, journal={ACM Transactions on Intelligent Systems Technololgy}, url={https://hrilab.tufts.edu/publications/sarathyetal25acmtist.pdf} }