With advances in generative AI, there’s rising work in direction of creating autonomous brokers that may handle each day duties by working consumer interfaces (UIs). Whereas prior analysis has studied the mechanics of how AI brokers would possibly navigate UIs and perceive UI construction, the consequences of brokers and their autonomous actions—notably these that could be dangerous or irreversible—stay under-explored. On this work, we examine the real-world impacts and penalties of cell UI actions taken by AI brokers. We started by growing a taxonomy of the impacts of cell UI actions by way of a collection of workshops with area consultants. Following this, we carried out a knowledge synthesis examine to collect life like cell UI display screen traces and motion information that customers understand as impactful. We then used our affect classes to annotate our collected information and information repurposed from current cell UI navigation datasets. Our quantitative evaluations of various massive language fashions (LLMs) and variants exhibit how properly completely different LLMs can perceive the impacts of cell UI actions that is likely to be taken by an agent. We present that our taxonomy enhances the reasoning capabilities of those LLMs for understanding the impacts of cell UI actions, however our findings additionally reveal vital gaps of their potential to reliably classify extra nuanced or complicated classes of affect.
- * Work completed whereas at Apple
- † College of Washington