Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Chromebook vs. Laptop computer: What Can and Cannot I Do With a Chromebook?

    October 15, 2025

    Construct a tool administration agent with Amazon Bedrock AgentCore

    October 15, 2025

    Exotec Celebrates 10 Years of Innovation: Driving A New Period of Warehouse Know-how

    October 15, 2025
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Machine Learning & Research

    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization

    Hannah O’SullivanBy Hannah O’SullivanApril 20, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Direct Desire Optimization (DPO) has been extensively adopted for choice alignment of Giant Language Fashions (LLMs) on account of its simplicity and effectiveness. Nevertheless, DPO is derived as a bandit downside by which the entire response is handled as a single arm, ignoring the significance variations between tokens, which can have an effect on optimization effectivity and make it troublesome to realize optimum outcomes. On this work, we suggest that the optimum knowledge for DPO has equal anticipated rewards for every token in profitable and dropping responses, as there isn’t a distinction in token significance. Nevertheless, for the reason that optimum dataset is unavailable in apply, we suggest utilizing the unique dataset for significance sampling to realize unbiased optimization. Accordingly, we suggest a token-level significance sampling DPO goal named TIS-DPO that assigns significance weights to every token primarily based on its reward. Impressed by earlier works, we estimate the token significance weights utilizing the distinction in prediction possibilities from a pair of contrastive LLMs. We discover three strategies to assemble these contrastive LLMs: (1) guiding the unique LLM with contrastive prompts, (2) coaching two separate LLMs utilizing profitable and dropping responses, and (3) performing ahead and reverse DPO coaching with profitable and dropping responses. Experiments present that TIS-DPO considerably outperforms numerous baseline strategies on harmlessness and helpfulness alignment and summarization duties. We additionally visualize the estimated weights, demonstrating their means to determine key token positions.

    †Work carried out throughout an internship at Apple.
    ‡Tsinghua College
    §College of Illinois at Chicago

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Hannah O’Sullivan
    • Website

    Related Posts

    Construct a tool administration agent with Amazon Bedrock AgentCore

    October 15, 2025

    Information Analytics Automation Scripts with SQL Saved Procedures

    October 15, 2025

    Enlightenment – O’Reilly

    October 15, 2025
    Top Posts

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025

    Meta resumes AI coaching utilizing EU person knowledge

    April 18, 2025
    Don't Miss

    Chromebook vs. Laptop computer: What Can and Cannot I Do With a Chromebook?

    By Sophia Ahmed WilsonOctober 15, 2025

    Chromebooks are a tempting possibility for the budget-conscious, particularly should you’re in search of a new…

    Construct a tool administration agent with Amazon Bedrock AgentCore

    October 15, 2025

    Exotec Celebrates 10 Years of Innovation: Driving A New Period of Warehouse Know-how

    October 15, 2025

    Digital Resilience within the Sky: Lumen Builds a Scalable, Future-Prepared Safety Ecosystem for the Area Needle

    October 15, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.