Instruction-Based Self-Supervised Online Training of the Perceptual Subsystem of a Cognitive Robotic Architecture

Sarah Schneider and Evan Krause and Daniel Soukup and Matthias Scheutz

Traditional AI systems often operate under the closed-world assumption, restricting their ability to adapt in dynamic environments. We propose a cognitive architecture (CA) that expands its perceptual capabilities by generating object prototypes from user-provided natural language descriptions. Each prototype is constructed using superellipsoid primitives, enabling structured and interpretable shape representations. The CA employs these prototypes to train a convolutional parametric shape encoder, using rendering parameterizations as automated ground-truth supervision. Once trained, the CA employs the encoder to infer superellipsoid-based representations from real-world object observations. A bidirectional mapping between superellipsoid parameters and natural language terms allows the CA to translate inferred geometric features into human-understandable descriptions. We detail the design of the prototype representations, the synthetically supervised training pipeline, and the language–geometry mapping process. Experimental results demonstrate that the CA enhances its perceptual repertoire through our structured, interpretable object representations.

@inproceedings{schneideretal25acs,
  title={Instruction-Based Self-Supervised Online Training of the Perceptual Subsystem of a Cognitive Robotic Architecture},
  author={Sarah Schneider and Evan Krause and Daniel Soukup and Matthias Scheutz},
  year={2025},
  booktitle={The Twelfth Annual Conference on Advances in Cognitive Systems (ACS 2025)},
  url={https://hrilab.tufts.edu/publications/schneideretal25acs.pdf}
}

Instruction-Based Self-Supervised Online Training of the Perceptual Subsystem of a Cognitive Robotic Architecture

© 2026 HRI Lab