ILuvUI: Instruction-Tuned Language-Imaginative and prescient Modeling of UIs from Machine Conversations

Multimodal Imaginative and prescient-Language Fashions (VLMs) allow highly effective purposes from their fused understanding of pictures and language, however
many carry out poorly on UI duties because of the lack of UI coaching information. On this paper, we adapt a recipe for producing paired text-image
coaching information for VLMs to the UI area by combining present pixel-based strategies with a Massive Language Mannequin (LLM). In contrast to
prior artwork, our technique requires no human-provided annotations, and it may be utilized to any dataset of UI screenshots. We generate a
dataset of 335K conversational examples paired with UIs that cowl Q&A, UI descriptions, and planning, and use it to fine-tune a
conversational VLM for UI duties. To evaluate the efficiency of our mannequin, we benchmark it on UI aspect detection duties, consider
response high quality, and showcase its applicability to multi-step UI navigation and planning.

** Work carried out whereas at Apple
† Aalto College

Main Menu

What's Hot

Auto-Shade RAT targets SAP NetWeaver bug in a complicated cyberattack

Verizon is giving clients a free Samsung Z Flip 7 — here is how you can get yours

MMAU: A Holistic Benchmark of Agent Capabilities Throughout Numerous Domains

ILuvUI: Instruction-Tuned Language-Imaginative and prescient Modeling of UIs from Machine Conversations

MMAU: A Holistic Benchmark of Agent Capabilities Throughout Numerous Domains

Construct a drug discovery analysis assistant utilizing Strands Brokers and Amazon Bedrock

Prime Abilities Information Scientists Ought to Study in 2025

Auto-Shade RAT targets SAP NetWeaver bug in a complicated cyberattack

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Auto-Shade RAT targets SAP NetWeaver bug in a complicated cyberattack

Verizon is giving clients a free Samsung Z Flip 7 — here is how you can get yours

MMAU: A Holistic Benchmark of Agent Capabilities Throughout Numerous Domains

How one nut processor cracked the code on heavy payload palletizing

Main Menu

Subscribe to Updates

What's Hot

ILuvUI: Instruction-Tuned Language-Imaginative and prescient Modeling of UIs from Machine Conversations

Related Posts