UniGen-1.5: Enhancing Picture Era and Enhancing by way of Reward Unification in Reinforcement Studying

We current UniGen-1.5, a unified multimodal giant language mannequin (MLLM) for superior picture understanding, technology and enhancing. Constructing upon UniGen, we comprehensively improve the mannequin structure and coaching pipeline to strengthen the picture understanding and technology capabilities whereas unlocking robust picture enhancing potential. Particularly, we suggest a unified Reinforcement Studying (RL) technique that improves each picture technology and picture enhancing collectively by way of shared reward fashions. To additional improve picture enhancing efficiency, we suggest a light-weight Edit Instruction Alignment stage that considerably improves the enhancing instruction comprehension that’s important for the success of the RL coaching. Experimental outcomes present that UniGen-1.5 demonstrates aggressive understanding and technology efficiency. Particularly, UniGen-1.5 achieves 0.89 and 4.31 total scores on GenEval and ImgEdit that surpass the state-of-the-art fashions corresponding to BAGEL and reaching efficiency corresponding to proprietary fashions corresponding to GPT-Picture-1.

† Institute of Reliable Embodied AI, Fudan College
‡ Challenge lead
§ Corresponding authors

Main Menu

What's Hot

GlassWorm Spreads through 72 Malicious Open VSX Extensions Hidden in Transitive Dependencies

Seth Godin on Management, Vulnerability, and Making an Influence within the New World Of Work

mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

UniGen-1.5: Enhancing Picture Era and Enhancing by way of Reward Unification in Reinforcement Studying

mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

P-EAGLE: Quicker LLM inference with Parallel Speculative Decoding in vLLM

We Used 5 Outlier Detection Strategies on a Actual Dataset: They Disagreed on 96% of Flagged Samples

GlassWorm Spreads through 72 Malicious Open VSX Extensions Hidden in Transitive Dependencies

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

GlassWorm Spreads through 72 Malicious Open VSX Extensions Hidden in Transitive Dependencies

Seth Godin on Management, Vulnerability, and Making an Influence within the New World Of Work

mAceReason-Math: A Dataset of Excessive-High quality Multilingual Math Issues Prepared For RLVR

AMC Robotics and HIVE Announce Collaboration to Advance AI-Pushed Robotics Compute Infrastructure

Main Menu

Subscribe to Updates

What's Hot

UniGen-1.5: Enhancing Picture Era and Enhancing by way of Reward Unification in Reinforcement Studying

Related Posts