Wednesday Jul 13, 2022
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

In this work, we propose to address this problem by performing object-centric alignment of the language embeddings from the CLIP model. Furthermore, we visually ground the objects with only image-level supervision using a pseudo-labeling process that provides high-quality object proposals and helps expand the vocabulary during training. We establish a bridge between the above two object-alignment strategies via a novel weight transfer function that aggregates their complimentary strengths. In essence, the proposed model seeks to minimize the gap between object and image-centric representations in the OVD setting. 2022: Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, F. Khan https://arxiv.org/pdf/2207.03482v1.pdf
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.