RubiCap: Rubric-Guided Reinforcement Studying for Dense Picture Captioning

Dense picture captioning is crucial for cross-modal alignment in vision-language pretraining and text-to-image technology, however scaling expert-quality annotations is prohibitively costly. Whereas artificial captioning through sturdy vision-language fashions (VLMs) is a sensible different, supervised distillation typically yields restricted output range and weak generalization. Reinforcement studying (RL) might overcome these limitations, however its successes have to this point been concentrated in verifiable domains that depend on deterministic checkers — a luxurious not accessible in open-ended captioning. We handle this bottleneck with RubiCap, a novel RL framework that derives fine-grained, sample-specific reward alerts from LLM-written rubrics. RubiCap first assembles a various committee of candidate captions, then employs an LLM rubric author to extract consensus strengths and diagnose deficiencies within the present coverage. These insights are transformed into specific analysis standards, enabling an LLM choose to decompose holistic high quality evaluation and substitute coarse scalar rewards with structured, multi-faceted evaluations. Throughout in depth benchmarks, RubiCap achieves the very best win charges on CapArena, outperforming supervised distillation, prior RL strategies, human-expert annotations, and GPT-4V-augmented outputs. On CaptionQA, it demonstrates superior phrase effectivity: our 7B mannequin matches Qwen2.5-VL-32B-Instruct, and our 3B mannequin surpasses its 7B counterpart. Remarkably, utilizing the compact RubiCap-3B as a captioner produces stronger pretrained VLMs than these skilled on captions from proprietary fashions.

† College of Wisconsin–Madison
** Work executed whereas at Apple

Main Menu

What's Hot

LeakNet Ransomware Makes use of ClickFix by way of Hacked Websites, Deploys Deno In-Reminiscence Loader

What’s actually in OpenAI’s Pentagon deal — and why many give up ChatGPT

RubiCap: Rubric-Guided Reinforcement Studying for Dense Picture Captioning

RubiCap: Rubric-Guided Reinforcement Studying for Dense Picture Captioning

High 7 Free Machine Studying Programs with Certificates

AWS and NVIDIA deepen strategic collaboration to speed up AI from pilot to manufacturing

5 Vital Shifts D&A Leaders Should Make to Drive Analytics and AI Success

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

LeakNet Ransomware Makes use of ClickFix by way of Hacked Websites, Deploys Deno In-Reminiscence Loader

What’s actually in OpenAI’s Pentagon deal — and why many give up ChatGPT

RubiCap: Rubric-Guided Reinforcement Studying for Dense Picture Captioning

Nebius and NVIDIA collaborate for bodily AI cloud

Main Menu

Subscribe to Updates

What's Hot

RubiCap: Rubric-Guided Reinforcement Studying for Dense Picture Captioning

Related Posts