Mistake detection in procedural tasks is essential for developing intelligent assistive agents that enhance learning and task execution. Existing methods predominantly focus on analyzing how an action is performed, overlooking what it produces, i.e., the Action Effect. However, execution mistakes often manifest not in the action itself but in its outcome, such as an unintended object state or incorrect spatial arrangement.
To bridge this gap, we introduce Action Effect Modeling, a novel framework that detects mistakes by evaluating deviations in action outcomes. Our method captures fine-grained object states and spatial relationships using an egocentric scene graph, enabling a more comprehensive understanding of procedural correctness. By explicitly modeling expected action effects, our method improves the detection of subtle execution errors that traditional approaches fail to identify.
We validate our framework on the challenging EgoPER dataset in a One-Class Classification (OCC) setting, demonstrating its effectiveness in identifying mistakes beyond conventional action-centric methods. Our findings highlight the significance of action effect reasoning in mistake detection and open new avenues for enhancing assistive intelligence in procedural activities.