Current years have witnessed the speedy improvement of open-world picture segmentation, together with open-vocabulary segmentation and in-context segmentation. Nonetheless, current strategies are restricted to a single modality immediate, which lacks the flexibleness and accuracy wanted for advanced object-aware prompting. On this work, we current COSINE, a unified open-world segmentation mannequin that Consolidates Open-vocabulary Segmentation and IN-context sEgmentation. By framing open-vocabulary process and in-context segmentation process as promptable segmentation duties, COSINE helps numerous modalities of enter, similar to photos and textual content. Containing a mannequin pool and a segdecoder, COSINE makes full use of the illustration functionality of foundations fashions and is ready to precisely section particular idea primarily based on numerous modalities of enter, similar to photos and textual content, providing highly effective open-world notion capabilities. Experiments on numerous segmentation duties present the effectiveness of the proposed methodology.
- † Zhejiang College
- ‡ Hangzhou Dianzi College
- § Zhejiang College of Expertise

