As AI systems are increasingly deployed in assistive technologies and autonomous environments, it is important for them to recognize when perceptual input is insufficient and to support users in acquiring missing information. This research investigates self-aware perception by focusing on two core capabilities: recognizing when input is incomplete or ambiguous, and identifying possible ways to obtain the missing information.

First, we explore assistive visual interfaces for blind and low-vision (BLV) users. We introduce the Directional Guidance task, which enables Vision-Language Models (VLMs) to detect when a question about an image cannot be answered due to framing issues and to suggest spatial camera adjustments. To address the lack of labeled training data, we design an automated perturbation-based data augmentation pipeline. Empirical results show that fine-tuned models outperform zero-shot baselines on a carefully constructed benchmark.

Second, we study structured representations for traffic scenes in autonomous driving. Using NuScenes data, we develop a neuro-symbolic pipeline based on Frame Theory to convert sensor data into interpretable summaries of agent motion and scene dynamics. These representations are designed to support introspection and may serve as inputs to symbolic-based reasoning in future work.

Together, these studies aim to contribute to the development of intelligent systems that better handle uncertainty and support interaction by recognizing the limits of what they currently perceive.

 

Event Host: Li Liu, Ph.D. Student, Computer Science and Engineering

Advisor: Leilani Gilpin

Event Details

See Who Is Interested

0 people are interested in this event


User Activity

No recent activity