Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Gaming Clans Develop into Progress Engine for Playnance Ecosystem

    March 16, 2026

    You should buy LG’s premium soundbar system for almost 50% off – Amazon Prime not required

    March 16, 2026

    Humanoid robotics builders should handle a variety of purposes

    March 16, 2026
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Emerging Tech»From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient venture gone sideways
    Emerging Tech

    From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient venture gone sideways

    Sophia Ahmed WilsonBy Sophia Ahmed WilsonJune 29, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient venture gone sideways
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    Be a part of the occasion trusted by enterprise leaders for almost twenty years. VB Remodel brings collectively the individuals constructing actual enterprise AI technique. Be taught extra


    Laptop imaginative and prescient initiatives not often go precisely as deliberate, and this one was no exception. The concept was easy: Construct a mannequin that would have a look at a photograph of a laptop computer and determine any bodily harm — issues like cracked screens, lacking keys or damaged hinges. It appeared like a simple use case for picture fashions and massive language mannequins (LLMs), but it surely rapidly changed into one thing extra difficult.

    Alongside the best way, we bumped into points with hallucinations, unreliable outputs and pictures that weren’t even laptops. To resolve these, we ended up making use of an agentic framework in an atypical approach — not for job automation, however to enhance the mannequin’s efficiency.

    On this submit, we are going to stroll by means of what we tried, what didn’t work and the way a mixture of approaches ultimately helped us construct one thing dependable.

    The place we began: Monolithic prompting

    Our preliminary strategy was pretty normal for a multimodal mannequin. We used a single, massive immediate to cross a picture into an image-capable LLM and requested it to determine seen harm. This monolithic prompting technique is straightforward to implement and works decently for clear, well-defined duties. However real-world information not often performs alongside.

    We bumped into three main points early on:

    • Hallucinations: The mannequin would generally invent harm that didn’t exist or mislabel what it was seeing.
    • Junk picture detection: It had no dependable approach to flag photographs that weren’t even laptops, like footage of desks, partitions or individuals often slipped by means of and acquired nonsensical harm experiences.
    • Inconsistent accuracy: The mix of those issues made the mannequin too unreliable for operational use.

    This was the purpose when it turned clear we would wish to iterate.

    First repair: Mixing picture resolutions

    One factor we seen was how a lot picture high quality affected the mannequin’s output. Customers uploaded every kind of photographs starting from sharp and high-resolution to blurry. This led us to discuss with analysis highlighting how picture decision impacts deep studying fashions.

    We educated and examined the mannequin utilizing a mixture of high-and low-resolution photographs. The concept was to make the mannequin extra resilient to the big selection of picture qualities it might encounter in observe. This helped enhance consistency, however the core problems with hallucination and junk picture dealing with continued.

    The multimodal detour: Textual content-only LLM goes multimodal

    Inspired by latest experiments in combining picture captioning with text-only LLMs — just like the method lined in The Batch, the place captions are generated from photographs after which interpreted by a language mannequin, we determined to present it a attempt.

    Right here’s the way it works:

    • The LLM begins by producing a number of doable captions for a picture. 
    • One other mannequin, referred to as a multimodal embedding mannequin, checks how properly every caption matches the picture. On this case, we used SigLIP to attain the similarity between the picture and the textual content.
    • The system retains the highest few captions based mostly on these scores.
    • The LLM makes use of these prime captions to write down new ones, attempting to get nearer to what the picture really reveals.
    • It repeats this course of till the captions cease bettering, or it hits a set restrict.

    Whereas intelligent in idea, this strategy launched new issues for our use case:

    • Persistent hallucinations: The captions themselves generally included imaginary harm, which the LLM then confidently reported.
    • Incomplete protection: Even with a number of captions, some points had been missed solely.
    • Elevated complexity, little profit: The added steps made the system extra difficult with out reliably outperforming the earlier setup.

    It was an fascinating experiment, however in the end not an answer.

    A inventive use of agentic frameworks

    This was the turning level. Whereas agentic frameworks are often used for orchestrating job flows (suppose brokers coordinating calendar invitations or customer support actions), we puzzled if breaking down the picture interpretation job into smaller, specialised brokers would possibly assist.

    We constructed an agentic framework structured like this:

    • Orchestrator agent: It checked the picture and recognized which laptop computer parts had been seen (display screen, keyboard, chassis, ports).
    • Element brokers: Devoted brokers inspected every part for particular harm sorts; for instance, one for cracked screens, one other for lacking keys.
    • Junk detection agent: A separate agent flagged whether or not the picture was even a laptop computer within the first place.

    This modular, task-driven strategy produced way more exact and explainable outcomes. Hallucinations dropped dramatically, junk photographs had been reliably flagged and every agent’s job was easy and centered sufficient to manage high quality properly.

    The blind spots: Commerce-offs of an agentic strategy

    As efficient as this was, it was not excellent. Two foremost limitations confirmed up:

    • Elevated latency: Operating a number of sequential brokers added to the whole inference time.
    • Protection gaps: Brokers may solely detect points they had been explicitly programmed to search for. If a picture confirmed one thing sudden that no agent was tasked with figuring out, it might go unnoticed.

    We would have liked a approach to stability precision with protection.

    The hybrid resolution: Combining agentic and monolithic approaches

    To bridge the gaps, we created a hybrid system:

    1. The agentic framework ran first, dealing with exact detection of identified harm sorts and junk photographs. We restricted the variety of brokers to probably the most important ones to enhance latency.
    2. Then, a monolithic picture LLM immediate scanned the picture for the rest the brokers may need missed.
    3. Lastly, we fine-tuned the mannequin utilizing a curated set of photographs for high-priority use circumstances, like steadily reported harm situations, to additional enhance accuracy and reliability.

    This mixture gave us the precision and explainability of the agentic setup, the broad protection of monolithic prompting and the boldness increase of focused fine-tuning.

    What we realized

    A couple of issues turned clear by the point we wrapped up this venture:

    • Agentic frameworks are extra versatile than they get credit score for: Whereas they’re often related to workflow administration, we discovered they might meaningfully increase mannequin efficiency when utilized in a structured, modular approach.
    • Mixing totally different approaches beats counting on only one: The mix of exact, agent-based detection alongside the broad protection of LLMs, plus a little bit of fine-tuning the place it mattered most, gave us way more dependable outcomes than any single methodology by itself.
    • Visible fashions are liable to hallucinations: Even the extra superior setups can soar to conclusions or see issues that aren’t there. It takes a considerate system design to maintain these errors in examine.
    • Picture high quality selection makes a distinction: Coaching and testing with each clear, high-resolution photographs and on a regular basis, lower-quality ones helped the mannequin keep resilient when confronted with unpredictable, real-world photographs.
    • You want a approach to catch junk photographs: A devoted examine for junk or unrelated footage was one of many easiest adjustments we made, and it had an outsized impression on total system reliability.

    Remaining ideas

    What began as a easy thought, utilizing an LLM immediate to detect bodily harm in laptop computer photographs, rapidly changed into a a lot deeper experiment in combining totally different AI strategies to sort out unpredictable, real-world issues. Alongside the best way, we realized that a few of the most helpful instruments had been ones not initially designed for one of these work.

    Agentic frameworks, typically seen as workflow utilities, proved surprisingly efficient when repurposed for duties like structured harm detection and picture filtering. With a little bit of creativity, they helped us construct a system that was not simply extra correct, however simpler to know and handle in observe.

    Shruti Tiwari is an AI product supervisor at Dell Applied sciences.

    Vadiraj Kulkarni is an information scientist at Dell Applied sciences.

    Day by day insights on enterprise use circumstances with VB Day by day

    If you wish to impress your boss, VB Day by day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for optimum ROI.

    Learn our Privateness Coverage

    Thanks for subscribing. Try extra VB newsletters right here.

    An error occured.


    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Sophia Ahmed Wilson
    • Website

    Related Posts

    You should buy LG’s premium soundbar system for almost 50% off – Amazon Prime not required

    March 16, 2026

    The phone is 150 years outdated. It’s nonetheless altering every little thing.

    March 15, 2026

    Y Combinator-backed Random Labs launches Slate V1, claiming the primary 'swarm-native' coding agent

    March 15, 2026
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    Gaming Clans Develop into Progress Engine for Playnance Ecosystem

    By Declan MurphyMarch 16, 2026

    Playnance is leveraging one of many world’s largest organized gaming networks to increase its Web3…

    You should buy LG’s premium soundbar system for almost 50% off – Amazon Prime not required

    March 16, 2026

    Humanoid robotics builders should handle a variety of purposes

    March 16, 2026

    OpenClaw AI Agent Flaws May Allow Immediate Injection and Information Exfiltration

    March 16, 2026
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2026 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.