A Metacognitive Architecture for ToM Revision in AI Agents

Abstract

This paper presents a metacognitive architecture for revising an AI agent’s Theory of Mind (ToM) to address misinterpretations in human–AI interaction. The ability to revise an agent’s interpretations of users’ mental states and characteristics is critical for maintaining trust and positive perceptions, especially in AI-mediated social interactions. To enable ToM revision, we introduce a two-level metacognitive architecture that integrates knowledge-based AI (KBAI) with LLMs. The architecture comprises a cognitive layer that performs the agent’s core tasks, and a metacognitive layer that introspects on the cognitive layer using a Task–Method–Knowledge (TMK) model of the agent. The metacognitive layer (1) revises its interpretation of the user in response to user feedback and (2) communicates the revision process to the user.

A Metacognitive Architecture for ToM Revision in AI Agents

Posted in .