Robots are entering hospitals, airports, classrooms and homes, yet a persistent barrier to effective human–robot interaction is often not capability itself, but whether people can make sense of robot behaviour quickly enough to coordinate with it. Prevailing work in HRI and explainable AI has largely treated this as an information-disclosure problem: the assumption is that revealing more of a system’s internal reasoning will improve understanding. We argue that this framing overlooks how people interpret behaviour under real-time constraints. Drawing on cognitive schema theory, we define robot understandability as the user’s ability to form a timely, workable interpretation of what the robot is doing, sufficient to anticipate what it will do next and respond appropriately, without access to its internal decision process. We propose a schema-alignment framework organised around four interdependent schemas that structure social sensemaking: context, role, procedure and strategy. Across case studies in embodied social robotics and large language model (LLM) interaction, we show that coordination breakdowns - pauses, errors, and interactional repairs - arise when system cues fail to support the schemas users rely on. We use LLM interaction as a useful comparison because it removes the demands of embodiment and helps isolate breakdowns that may reflect more general processes of sensemaking. At the same time, the comparison has clear limits: embodied robots introduce additional demands, including coordination in shared physical space and the interpretation of nonverbal cues, such as gaze and gesture. Together, these arguments reframe understandability as a problem of interactional cueing rather than information access, and provide a psychologically grounded basis for robot design.