Papers Read on AI

Papers Read on AI header image 1
May 12, 2022  

Language Models Can See: Plugging Visual Controls in Text Generation

May 12, 2022

In this work, we propose a training-free framework, called MAGIC (i MA ge- Guided text generation with C LIP), for plugging in visual controls in the generation process and enabling LMs to perform multimodal tasks (e.g., image captioning) in a zero-shot manner. MAGIC is a simple yet efficient plug-and-play framework, which directly combines an off-the-shelf LM (i.e., GPT-2) and an image-text matching model (i.e., CLIP) for image-grounded text generation.

2022: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, N. Collier

https://arxiv.org/pdf/2205.02655v1.pdf