Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Iran-Linked Hacktivists Declare Harmful Cyberattack on Medtech Agency Stryker

    March 15, 2026

    Right this moment’s NYT Mini Crossword Solutions for March 15

    March 15, 2026

    Multilingual Reasoning Gymnasium: Multilingual Scaling of Procedural Reasoning Environments

    March 15, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Emerging Tech»OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
    Emerging Tech

    OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic

    Sophia Ahmed WilsonBy Sophia Ahmed WilsonAugust 23, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now


    A brand new framework from researchers at The College of Hong Kong (HKU) and collaborating establishments gives an open supply basis for creating sturdy AI brokers that may function computer systems. The framework, referred to as OpenCUA, contains the instruments, information, and recipes for scaling the event of computer-use brokers (CUAs).

    Fashions skilled utilizing this framework carry out strongly on CUA benchmarks, outperforming current open supply fashions and competing carefully with closed brokers from main AI labs like OpenAI and Anthropic.

    The problem of constructing computer-use brokers

    Pc-use brokers are designed to autonomously full duties on a pc, from navigating web sites to working advanced software program. They will additionally assist automate workflows within the enterprise. Nonetheless, probably the most succesful CUA techniques are proprietary, with crucial particulars about their coaching information, architectures, and improvement processes stored personal.

    “As the dearth of transparency limits technical developments and raises security considerations, the analysis group wants actually open CUA frameworks to check their capabilities, limitations, and dangers,” the researchers state in their paper.


    AI Scaling Hits Its Limits

    Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be part of our unique salon to find how high groups are:

    • Turning power right into a strategic benefit
    • Architecting environment friendly inference for actual throughput positive factors
    • Unlocking aggressive ROI with sustainable AI techniques

    Safe your spot to remain forward: https://bit.ly/4mwGngO


    On the identical time, open supply efforts face their very own set of hurdles. There was no scalable infrastructure for amassing the various, large-scale information wanted to coach these brokers. Current open supply datasets for graphical consumer interfaces (GUIs) have restricted information, and plenty of analysis initiatives present inadequate element about their strategies, making it troublesome for others to duplicate their work.

    In accordance with the paper, “These limitations collectively hinder advances in general-purpose CUAs and prohibit a significant exploration of their scalability, generalizability, and potential studying approaches.”

    Introducing OpenCUA

    OpenCUA framework Supply: XLANG Lab at HKU

    OpenCUA is an open supply framework designed to deal with these challenges by scaling each the information assortment and the fashions themselves. At its core is the AgentNet Instrument for recording human demonstrations of pc duties on totally different working techniques.

    The device streamlines information assortment by working within the background on an annotator’s private pc, capturing display screen movies, mouse and keyboard inputs, and the underlying accessibility tree, which gives structured details about on-screen components. This uncooked information is then processed into “state-action trajectories,” pairing a screenshot of the pc (the state) with the consumer’s corresponding motion (a click on, key press, and so on.). Annotators can then evaluation, edit, and submit these demonstrations.

    AgentNet device Supply: XLang Lab at HKU

    Utilizing this device, the researchers collected the AgentNet dataset, which incorporates over 22,600 job demonstrations throughout Home windows, macOS, and Ubuntu, spanning greater than 200 purposes and web sites. “This dataset authentically captures the complexity of human behaviors and environmental dynamics from customers’ private computing environments,” the paper notes.

    Recognizing that screen-recording instruments elevate important information privateness considerations for enterprises, the researchers designed the AgentNet Instrument with safety in thoughts. Xinyuan Wang, co-author of the paper and PhD scholar at HKU, defined that they applied a multi-layer privateness safety framework. “First, annotators themselves can absolutely observe the information they generate… earlier than deciding whether or not to submit it,” he advised VentureBeat. The info then undergoes handbook verification for privateness points and automatic scanning by a big mannequin to detect any remaining delicate content material earlier than launch. “This layered course of ensures enterprise-grade robustness for environments dealing with delicate buyer or monetary information,” Wang added.

    To speed up analysis, the crew additionally curated AgentNetBench, an offline benchmark that gives a number of right actions for every step, providing a extra environment friendly option to measure an agent’s efficiency.

    A brand new recipe for coaching brokers

    The OpenCUA framework introduces a novel pipeline for processing information and coaching computer-use brokers. Step one converts the uncooked human demonstrations into clear state-action pairs appropriate for coaching vision-language fashions (VLMs). Nonetheless, the researchers discovered that merely coaching fashions on these pairs yields restricted efficiency positive factors, even with massive quantities of knowledge.

    OpenCUA chain-of-thought pipeline Supply: XLang Lab at HKU

    The important thing perception was to enhance these trajectories with chain-of-thought (CoT) reasoning. This course of generates an in depth “inside monologue” for every motion, which incorporates planning, reminiscence, and reflection. This structured reasoning is organized into three ranges: a high-level commentary of the display screen, reflective ideas that analyze the scenario and plan the following steps, and at last, the concise, executable motion. This strategy helps the agent develop a deeper understanding of the duties.

    “We discover pure language reasoning essential for generalizable computer-use basis fashions, serving to CUAs internalize cognitive capabilities,” the researchers write.

    This information synthesis pipeline is a normal framework that may be tailored by firms to coach brokers on their very own distinctive inner instruments. In accordance with Wang, an enterprise can file demonstrations of its proprietary workflows and use the identical “reflector” and “generator” pipeline to create the mandatory coaching information. “This enables them to bootstrap a high-performing agent tailor-made to their inner instruments without having to handcraft reasoning traces manually,” he defined.

    Placing OpenCUA to the check

    The researchers utilized the OpenCUA framework to coach a variety of open supply VLMs, together with variants of Qwen and Kimi-VL, with parameter sizes from 3 billion to 32 billion. The fashions had been evaluated on a collection of on-line and offline benchmarks that check their skill to carry out duties and perceive GUIs.

    The 32-billion-parameter mannequin, OpenCUA-32B, established a brand new state-of-the-art success fee amongst open supply fashions on the OSWorld-Verified benchmark. It additionally surpassed OpenAI’s GPT-4o-based CUA and considerably closed the efficiency hole with Anthropic’s main proprietary fashions.

    OpenCUA reveals huge enchancment over base fashions (left) whereas competing with main CUA fashions (proper) Supply: XLANG Lab at HKU

    For enterprise builders and product leaders, the analysis affords a number of key findings. The OpenCUA technique is broadly relevant, enhancing efficiency on fashions with totally different architectures (each dense and mixture-of-experts) and sizes. The skilled brokers additionally present robust generalization, performing nicely throughout a various vary of duties and working techniques.

    In accordance with Wang, the framework is especially fitted to automating repetitive, labor-intensive enterprise workflows. “For instance, within the AgentNet dataset, we already seize a couple of demonstrations of launching EC2 situations on Amazon AWS and configuring annotation parameters on MTurk,” he advised VentureBeat. “These duties contain many sequential steps however observe repeatable patterns.”

    Nonetheless, Wang famous that bridging the hole to reside deployment requires addressing key challenges round security and reliability. “The largest problem in actual deployment is security and reliability: the agent should keep away from errors that would inadvertently alter system settings or set off dangerous negative effects past the supposed job,” he mentioned.

    The researchers have launched the code, dataset, and weights for his or her fashions.

    As open supply brokers constructed on frameworks like OpenCUA change into extra succesful, they might basically evolve the connection between information employees and their computer systems. Wang envisions a future the place proficiency in advanced software program turns into much less necessary than the power to obviously articulate objectives to an AI agent.

    He described two major modes of labor: “offline automation, the place the agent leverages its broader software program information to pursue a job end-to-end,” and “on-line collaboration, the place the agent responds in real-time and works aspect by aspect with the human, very like a colleague.” Mainly, the people will present the strategic “what,” whereas more and more refined AI brokers deal with the operational “how.”

    Day by day insights on enterprise use circumstances with VB Day by day

    If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for optimum ROI.

    Learn our Privateness Coverage

    Thanks for subscribing. Try extra VB newsletters right here.

    An error occured.


    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Sophia Ahmed Wilson
    • Website

    Related Posts

    Right this moment’s NYT Mini Crossword Solutions for March 15

    March 15, 2026

    NYT Connections Sports activities Version hints and solutions for March 15: Tricks to remedy Connections #538

    March 15, 2026

    Easy methods to Purchase Used or Refurbished Electronics (2026)

    March 14, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    Iran-Linked Hacktivists Declare Harmful Cyberattack on Medtech Agency Stryker

    By Declan MurphyMarch 15, 2026

    A hacktivist group with alleged hyperlinks to Iran’s intelligence companies has claimed accountability for a…

    Right this moment’s NYT Mini Crossword Solutions for March 15

    March 15, 2026

    Multilingual Reasoning Gymnasium: Multilingual Scaling of Procedural Reasoning Environments

    March 15, 2026

    Knowledge safety is the muse of belief in bodily AI

    March 15, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.