Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    7 Cool Python Initiatives to Automate the Boring Stuff

    June 9, 2025

    Kettering Well being Confirms Interlock Ransomware Breach and Information Theft

    June 9, 2025

    Dangers of Staying on Home windows 10 After Finish of Assist (EOS)

    June 9, 2025
    Facebook X (Twitter) Instagram
    UK Tech Insider
    Facebook X (Twitter) Instagram Pinterest Vimeo
    UK Tech Insider
    Home»Machine Learning & Research»TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Machine Learning & Research

    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization

    Hannah O’SullivanBy Hannah O’SullivanApril 20, 2025Updated:April 29, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Direct Desire Optimization (DPO) has been extensively adopted for choice alignment of Giant Language Fashions (LLMs) on account of its simplicity and effectiveness. Nevertheless, DPO is derived as a bandit downside by which the entire response is handled as a single arm, ignoring the significance variations between tokens, which can have an effect on optimization effectivity and make it troublesome to realize optimum outcomes. On this work, we suggest that the optimum knowledge for DPO has equal anticipated rewards for every token in profitable and dropping responses, as there isn’t a distinction in token significance. Nevertheless, for the reason that optimum dataset is unavailable in apply, we suggest utilizing the unique dataset for significance sampling to realize unbiased optimization. Accordingly, we suggest a token-level significance sampling DPO goal named TIS-DPO that assigns significance weights to every token primarily based on its reward. Impressed by earlier works, we estimate the token significance weights utilizing the distinction in prediction possibilities from a pair of contrastive LLMs. We discover three strategies to assemble these contrastive LLMs: (1) guiding the unique LLM with contrastive prompts, (2) coaching two separate LLMs utilizing profitable and dropping responses, and (3) performing ahead and reverse DPO coaching with profitable and dropping responses. Experiments present that TIS-DPO considerably outperforms numerous baseline strategies on harmlessness and helpfulness alignment and summarization duties. We additionally visualize the estimated weights, demonstrating their means to determine key token positions.

    †Work carried out throughout an internship at Apple.
    ‡Tsinghua College
    §College of Illinois at Chicago

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Hannah O’Sullivan
    • Website

    Related Posts

    7 Cool Python Initiatives to Automate the Boring Stuff

    June 9, 2025

    ML Mannequin Serving with FastAPI and Redis for sooner predictions

    June 9, 2025

    Construct a Textual content-to-SQL resolution for information consistency in generative AI utilizing Amazon Nova

    June 7, 2025
    Leave A Reply Cancel Reply

    Top Posts

    7 Cool Python Initiatives to Automate the Boring Stuff

    June 9, 2025

    How AI is Redrawing the World’s Electrical energy Maps: Insights from the IEA Report

    April 18, 2025

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025
    Don't Miss

    7 Cool Python Initiatives to Automate the Boring Stuff

    By Oliver ChambersJune 9, 2025

    Picture by Creator | Ideogram   Have you ever ever spent a number of hours…

    Kettering Well being Confirms Interlock Ransomware Breach and Information Theft

    June 9, 2025

    Dangers of Staying on Home windows 10 After Finish of Assist (EOS)

    June 9, 2025

    Unmasking the silent saboteur you didn’t know was operating the present

    June 9, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.