TrajTok: Studying Trajectory Tokens allows higher Video Understanding

Tokenization in video fashions, usually by means of patchification, generates an extreme and redundant variety of tokens. This severely limits video effectivity and scalability. Whereas latest trajectory-based tokenizers supply a promising answer by decoupling video length from token depend, they depend on advanced exterior segmentation and monitoring pipelines which might be sluggish and task-agnostic. We suggest TrajTok, an end-to-end video tokenizer module that’s totally built-in and co-trained with video fashions for a downstream goal, dynamically adapting its token granularity to semantic complexity, unbiased of video length. TrajTok incorporates a unified segmenter that performs implicit clustering over pixels in each house and time to straight produce object trajectories in a single ahead move. By prioritizing downstream adaptability over pixel-perfect segmentation constancy, TrajTok is light-weight and environment friendly, but empirically improves video understanding efficiency. With TrajTok, we implement a video CLIP mannequin educated from scratch (TrajViT2). It achieves one of the best accuracy at scale throughout each classification and retrieval benchmarks, whereas sustaining effectivity akin to one of the best token-merging strategies. TrajTok additionally proves to be a flexible part past its position as a tokenizer. We present that it may be seamlessly built-in as both a probing head for pretrained visible options (TrajAdapter) or an alignment connector in vision-language fashions (TrajVLM) with particularly sturdy efficiency in long-video reasoning.

† College of Washington
‡ Allen Institute for Synthetic Intelligence (AI2)
§ Woven by Toyota, Inc.

Main Menu

What's Hot

Lazarus Group Bitrefill Cyberattack Crypto Risk

10 low cost and straightforward devices that significantly upgraded my sensible residence (and a few are on sale)

Why Treating Individuals Effectively Is Priceless & Much less Costly Than You Suppose

TrajTok: Studying Trajectory Tokens allows higher Video Understanding

Use RAG for video technology utilizing Amazon Bedrock and Amazon Nova Reel

SynthID: What it’s and The way it Works

5 Manufacturing Scaling Challenges for Agentic AI in 2026

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Lazarus Group Bitrefill Cyberattack Crypto Risk

10 low cost and straightforward devices that significantly upgraded my sensible residence (and a few are on sale)

Why Treating Individuals Effectively Is Priceless & Much less Costly Than You Suppose

TrajTok: Studying Trajectory Tokens allows higher Video Understanding

Main Menu

Subscribe to Updates

What's Hot

TrajTok: Studying Trajectory Tokens allows higher Video Understanding

Related Posts