We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets). We validate it on both classical (ResNet) and modern (ConvNeXt) models. Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models. All this evidence reveals a promising future of generative pre-training on convnets.
2023: Keyu Tian, Yi Jiang, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan
Ranked #1 on Instance Segmentation on COCO 2017 val
To leave or reply to comments, please download free Podbean or
To leave or reply to comments, please download free Podbean App.