People who find themselves blind or have low imaginative and prescient (BLV) might hesitate to journey independently in unfamiliar environments as a consequence of uncertainty in regards to the bodily panorama. Whereas most instruments deal with in-situ navigation, these exploring pre-travel help sometimes present solely landmarks and turn-by-turn directions, missing detailed visible context. Road view imagery, which accommodates wealthy visible data and has the potential to disclose quite a few environmental particulars, stays inaccessible to BLV individuals. On this work, we introduce SceneScout, a multimodal massive language mannequin (MLLM)-driven AI agent that allows accessible interactions with avenue view imagery. SceneScout helps two modes: (1) Route Preview, enabling customers to familiarize themselves with visible particulars alongside a route, and (2) Digital Exploration, enabling free motion inside avenue view imagery. Our person research (N=10) demonstrates that SceneScout helps BLV customers uncover visible data in any other case unavailable by present means. A technical analysis reveals that the majority descriptions are correct (72%) and describe steady visible parts (95%) even in older imagery, although occasional refined and believable errors make them tough to confirm with out sight. We focus on future alternatives and challenges of utilizing avenue view imagery to boost navigation experiences.
- † Work finished whereas at Apple
- ‡ Columbia College