
8.2K
Downloads
243
Episodes
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
Episodes

Thursday Dec 01, 2022
Thursday Dec 01, 2022
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challeng-ing benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved the new record 65.4 mAP on COCO test-dev.
2022: Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiao-hua Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Y. Qiao
Ranked #1 on Object Detection on COCO test-dev (using extra training data)
https://arxiv.org/pdf/2211.05778v2.pdf

Tuesday Nov 15, 2022
OneFormer: One Transformer to Rule Universal Image Segmentation
Tuesday Nov 15, 2022
Tuesday Nov 15, 2022
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. 2022: Jitesh Jain, Jiacheng Li, M. Chiu, Ali Hassani, Nikita Orlov, H. Shi Ranked #1 on Instance Segmentation on ADE20K val https://arxiv.org/pdf/2211.06220v1.pdf

Friday Nov 11, 2022
Large Language Models Are Human-Level Prompt Engineers
Friday Nov 11, 2022
Friday Nov 11, 2022
By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the “program,” optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. 2022: Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba https://arxiv.org/pdf/2211.01910v1.pdf

Thursday Nov 10, 2022
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Thursday Nov 10, 2022
Thursday Nov 10, 2022
During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image. This motivates us to cache and reuse the feature maps of the original image. 2022: Muyang Li, Ji Lin, Chenlin Meng, S. Ermon, Song Han, Junyan Zhu https://arxiv.org/pdf/2211.02048v1.pdf

Friday Nov 04, 2022
On the Versatile Uses of Partial Distance Correlation in Deep Learning
Friday Nov 04, 2022
Friday Nov 04, 2022
Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. 2022: Xingjian Zhen, Zihang Meng, Rudrasis Chakraborty, Vikas Singh https://arxiv.org/pdf/2207.09684v2.pdf

Thursday Nov 03, 2022
SeaPearl: A Constraint Programming Solver guided by Reinforcement Learning
Thursday Nov 03, 2022
Thursday Nov 03, 2022
The design of efficient and generic algorithms for solving combinatorial optimization problems has been an active field of research for many years. Standard exact solving approaches are based on a clever and complete enumeration of the solution set. A critical and non-trivial design choice with such methods is the branching strategy, directing how the search is performed. This paper presents the proof of concept for SeaPearl, a new CP solver implemented in Julia, that supports machine learning routines in order to learn branching decisions using reinforcement learning. 2021: Félix Chalumeau, Ilan Coulon, Quentin Cappart, Louis-Martin Rousseau https://arxiv.org/pdf/2102.09193v2.pdf

Wednesday Nov 02, 2022
What Makes Convolutional Models Great on Long Sequence Modeling?
Wednesday Nov 02, 2022
Wednesday Nov 02, 2022
Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution , making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information based on the pair-wise attention score but also makes the computational complexity quadratic to the sequence length. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. With Fast Fourier Transform, S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophis-ticated parameterization and initialization schemes that combine the wisdom from several prior works. As a result, S4 is less intuitive and hard to use for researchers with limited prior knowledge. Here we aim to demystify S4 and extract basic principles that contribute to the success of S4 as a global convolutional model.
2022: Yuhong Li, Tianle Cai, Yi Zhang, De-huai Chen, Debadeepta Dey https://arxiv.org/pdf/2210.09298v1.pdf

Tuesday Nov 01, 2022
Tuesday Nov 01, 2022
We present Amos , a stochastic gradient-based optimizer designed for training deep neural networks. It can be viewed as an Adam optimizer with theoretically supported, adaptive learning-rate decay and weight decay. A key insight behind Amos is that it leverages model-specific information to determine the initial learning-rate and decaying schedules. When used for pre-training BERT variants and T5, Amos consistently converges faster than the state-of-the-art settings of AdamW, achieving better validation loss within ≤ 70% training steps and time, while requiring ≤ 51% memory for slot variables. Our code is open-sourced at: https: //github.com/google-research/jestimator . 2022: Ran Tian, Ankur P. Parikh https://arxiv.org/pdf/2210.11693v1.pdf

Monday Oct 31, 2022
Monday Oct 31, 2022
We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second , needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. 2022: Noah Hollmann, Samuel Muller, Katharina Eggensperger, F. Hutter https://arxiv.org/pdf/2207.01848v3.pdf

Wednesday Oct 26, 2022
Long Range Graph Benchmark
Wednesday Oct 26, 2022
Wednesday Oct 26, 2022
Here, we present the Long Range Graph Benchmark (LRGB) 1 with 5 graph learning datasets: PascalVOC-SP , COCO-SP , PCQM-Contact , Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI. 2022: Vijay Prakash Dwivedi, Ladislav Rampášek, Mikhail Galkin, Alipanah Parviz, Guy Wolf, A. Luu, D. Beaini Ranked #1 on Node Classification on PascalVOC-SP https://arxiv.org/pdf/2206.08164v1.pdf