Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Shopflo Secures $20M in Funding Spherical Led by Binny Bansal, Units Its Sights on International Retail Tech Disruption

    July 29, 2025

    GLOBAL GROUP Ransomware Claims Breach of Media Large Albavisión

    July 29, 2025

    LegalZoom Promo Code: Unique 10% Off LLC Formations

    July 29, 2025
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Facebook X (Twitter) Instagram
    UK Tech InsiderUK Tech Insider
    Home»Machine Learning & Research»TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Machine Learning & Research

    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization

    Hannah O’SullivanBy Hannah O’SullivanApril 20, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    TIS-DPO: Token-level Significance Sampling for Direct Desire Optimization
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Direct Desire Optimization (DPO) has been extensively adopted for choice alignment of Giant Language Fashions (LLMs) on account of its simplicity and effectiveness. Nevertheless, DPO is derived as a bandit downside by which the entire response is handled as a single arm, ignoring the significance variations between tokens, which can have an effect on optimization effectivity and make it troublesome to realize optimum outcomes. On this work, we suggest that the optimum knowledge for DPO has equal anticipated rewards for every token in profitable and dropping responses, as there isn’t a distinction in token significance. Nevertheless, for the reason that optimum dataset is unavailable in apply, we suggest utilizing the unique dataset for significance sampling to realize unbiased optimization. Accordingly, we suggest a token-level significance sampling DPO goal named TIS-DPO that assigns significance weights to every token primarily based on its reward. Impressed by earlier works, we estimate the token significance weights utilizing the distinction in prediction possibilities from a pair of contrastive LLMs. We discover three strategies to assemble these contrastive LLMs: (1) guiding the unique LLM with contrastive prompts, (2) coaching two separate LLMs utilizing profitable and dropping responses, and (3) performing ahead and reverse DPO coaching with profitable and dropping responses. Experiments present that TIS-DPO considerably outperforms numerous baseline strategies on harmlessness and helpfulness alignment and summarization duties. We additionally visualize the estimated weights, demonstrating their means to determine key token positions.

    †Work carried out throughout an internship at Apple.
    ‡Tsinghua College
    §College of Illinois at Chicago

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Hannah O’Sullivan
    • Website

    Related Posts

    Construct a drug discovery analysis assistant utilizing Strands Brokers and Amazon Bedrock

    July 29, 2025

    Prime Abilities Information Scientists Ought to Study in 2025

    July 29, 2025

    mRAKL: Multilingual Retrieval-Augmented Information Graph Building for Low-Resourced Languages

    July 28, 2025
    Top Posts

    Shopflo Secures $20M in Funding Spherical Led by Binny Bansal, Units Its Sights on International Retail Tech Disruption

    July 29, 2025

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025

    Midjourney V7: Quicker, smarter, extra reasonable

    April 18, 2025
    Don't Miss

    Shopflo Secures $20M in Funding Spherical Led by Binny Bansal, Units Its Sights on International Retail Tech Disruption

    By Amelia Harper JonesJuly 29, 2025

    In a daring transfer that alerts rising investor confidence in India’s D2C infrastructure ecosystem, Bengaluru-based…

    GLOBAL GROUP Ransomware Claims Breach of Media Large Albavisión

    July 29, 2025

    LegalZoom Promo Code: Unique 10% Off LLC Formations

    July 29, 2025

    Excessive Profile Leisure Company Streamlined Hiring & Uncovered Hidden Expertise Utilizing Braintrust AIR

    July 29, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.