
8.2K
Downloads
243
Episodes
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
Episodes

4 hours ago
4 hours ago
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pre-trained in two stages. 2023: Junnan Li, Dongxu Li, S. Savarese, Steven Hoi Ranked #1 on Image Retrieval on COCO https://arxiv.org/pdf/2301.12597v1.pdf

6 days ago
6 days ago
We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models—a language model (GPT-3) and a text-to-image model (Stable Diffusion)—to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per-example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions. 2022: Tim Brooks, Aleksander Holynski, Alexei A. Efros https://arxiv.org/pdf/2211.09800v2.pdf

7 days ago
7 days ago
Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance to 1) improve the mapping from degraded inputs to desired outputs, or 2) complement high-quality details lost in the inputs. In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces. 2022: Shangchen Zhou, Kelvin C. K. Chan, Chongyi Li, Chen Change Loy Ranked #1 on Blind Face Restoration on WIDER https://arxiv.org/pdf/2206.11253v2.pdf

Monday Jan 30, 2023
Monday Jan 30, 2023
Large pretrained language models have shown surprising In-Context Learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without additional parameter updates. Despite the great success in performance, the working mechanism of ICL still remains an open problem. In order to better understand how ICL works, this paper explains language models as meta-optimizers and understands ICL as a kind of implicit finetuning. 2022: Damai Dai, Yutao Sun, Li Dong, Y. Hao, Zhifang Sui, Furu Wei https://arxiv.org/pdf/2212.10559v2.pdf

Monday Jan 23, 2023
Monday Jan 23, 2023
The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. 2023: Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu https://arxiv.org/pdf/2301.07597v1.pdf

Thursday Jan 19, 2023
Why do Nearest Neighbor Language Models Work?
Thursday Jan 19, 2023
Thursday Jan 19, 2023
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k -nearest neighbor language models ( k NN-LMs) perform better than standard parametric LMs, even when the k -nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. 2023: Frank F. Xu, Uri Alon, Graham Neubig https://arxiv.org/pdf/2301.02828v1.pdf

Tuesday Jan 17, 2023
Text2Poster: Laying Out Stylized Texts on Retrieved Images
Tuesday Jan 17, 2023
Tuesday Jan 17, 2023
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience. In this paper, we propose a novel data-driven framework, called Text2Poster, to automatically generate visually-effective posters from textual information. Imitating the process of manual poster editing, our framework leverages a large-scale pretrained visual-textual model to retrieve background images from given texts, lays out the texts on the images iteratively by cascaded autoencoders, and finally, stylizes the texts by a matching-based method. We learn the modules of the framework by weakly-and self-supervised learning strategies, mitigating the demand for labeled data. Both objective and subjective experiments demonstrate that our Text2Poster outperforms state-of-the-art methods, including academic research and commercial software, on the quality of generated posters. 2022: Chuhao Jin, H. Xu, Ruihua Song, Zhiwu Lu https://arxiv.org/pdf/2301.02363v1.pdf

Monday Jan 16, 2023
Monday Jan 16, 2023
We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets). We validate it on both classical (ResNet) and modern (ConvNeXt) models. Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models. All this evidence reveals a promising future of generative pre-training on convnets. 2023: Keyu Tian, Yi Jiang, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan Ranked #1 on Instance Segmentation on COCO 2017 val https://arxiv.org/pdf/2301.03580v2.pdf

Wednesday Jan 04, 2023
Reversible Column Networks
Wednesday Jan 04, 2023
Wednesday Jan 04, 2023
We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. 2022: Y. Cai, Yi Zhou, Qi Han, Jia-Ying Sun, Xiangwen Kong, Jun Yu Li, Xiangyu Zhang https://arxiv.org/pdf/2212.11696v1.pdf

Tuesday Jan 03, 2023
The Forward-Forward Algorithm: Some Preliminary Investigations
Tuesday Jan 03, 2023
Tuesday Jan 03, 2023
The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes. 2022: G. Hinton https://arxiv.org/pdf/2212.13345v1.pdf