Enhance Imaginative and prescient Language Mannequin Chain-of-thought Reasoning

Chain-of-thought (CoT) reasoning in imaginative and prescient language
fashions (VLMs) is essential for enhancing
interpretability and trustworthiness. Nonetheless,
present coaching recipes typically counting on
datasets dominated by quick annotations with
minimal rationales. On this work, we present that
coaching VLM on quick solutions results in poor
generalization on reasoning duties that require
extra detailed explanations. To handle this limitation,
we suggest a two-stage post-training
technique that extends the utilization of quick reply
information for enhanced CoT reasoning. First, we
increase quick solutions with CoT reasoning
generated by GPT-4o, enhancing the VLM’s
CoT capabilities via fine-tuning. Second,
we leverage quick solutions as end result rewards
for reinforcement studying. Particularly, quick
solutions are used as correctness indicators to
assemble optimistic (appropriate) and unfavourable (incorrect)
pairs from model-generated reasoning
chains. These pairs are then used to calibrate
the mannequin’s reasoning by way of Direct Desire Optimization.
Our experiments present important
enhancements in CoT reasoning on benchmark
datasets, together with enhanced generalization to
direct reply prediction. This work supplies
a essential information useful resource for VLM CoT coaching
and demonstrates the effectiveness of end result
rewards for multimodal fashions post-training.

† Work achieved whereas at Apple
‡ Carnegie Mellon College

Main Menu

What's Hot

High 7 AI Agent Orchestration Frameworks

iRobot is bringing the Roomba Mini to the U.Ok. and Europe

AI use is altering how a lot firms pay for cyber insurance coverage

Enhance Imaginative and prescient Language Mannequin Chain-of-thought Reasoning

High 7 AI Agent Orchestration Frameworks

Setting Up a Google Colab AI-Assisted Coding Surroundings That Really Works

We ran 16 AI Fashions on 9,000+ Actual Paperwork. Here is What We Discovered.

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

High 7 AI Agent Orchestration Frameworks

iRobot is bringing the Roomba Mini to the U.Ok. and Europe

AI use is altering how a lot firms pay for cyber insurance coverage

AI-Powered Cybercrime Is Surging. The US Misplaced $16.6 Billion in 2024.

Main Menu

Subscribe to Updates

What's Hot

Enhance Imaginative and prescient Language Mannequin Chain-of-thought Reasoning

Related Posts