Papers Read on AI

Papers Read on AI header image 1
July 18, 2022  

Matryoshka Representations for Adaptive Deployment

Our contribution is Matryoshka Representation Learning ( MRL ) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations.2022: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, S. Kakade, Prateek Jain, Ali Farhadihttps://arxiv.org/pdf/2205.13147v2.pdf

July 15, 2022  

Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs

We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31 × 31, in contrast to commonly used 3 × 3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g ., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive fields and higher shape bias rather than texture bias.

2022: Xiaohan Ding, X. Zhang, Yi Zhou, Jungong Han, Guiguang Ding, Jian Sun

https://arxiv.org/pdf/2203.06717v4.pdf

July 14, 2022  

More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity

Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) seems to be challenged by increasingly effective transformer-based models. Very recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism, showing appealing performance and efficiency. We propose Sparse Large Kernel Network ( SLaK ), a pure CNN architecture equipped with 51 × 51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers and modern ConvNet architectures like ConvNeXt and RepLKNet, on ImageNet classification as well as typical downstream tasks.

2022: S. Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Q. Xiao, Boqian Wu, Mykola Pechenizkiy, D. Mocanu, Zhangyang Wang

https://arxiv.org/pdf/2207.03620v1.pdf

July 14, 2022  

The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus

We propose a new setup for evaluating existing knowledge intensive tasks in which we generalize the background corpus to a universal web snapshot. We investigate a slate of NLP tasks which rely on knowledge - either factual or common sense, and ask systems to use a subset of CCNet—the S PHERE corpus—as a knowledge source. In contrast to Wikipedia, otherwise a common background corpus in KI-NLP, S PHERE is orders of magnitude larger and better reflects the full diversity of knowledge on the web.

2021: Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Ouguz, Edouard Grave, Wen-tau Yih, Sebastian Riedel

https://arxiv.org/pdf/2112.09924v2.pdf

July 13, 2022  

GhostNet: More Features From Cheap Operations

Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features.

2019: Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu

https://arxiv.org/pdf/1911.11907v2.pdf

July 13, 2022  

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

In this work, we propose to address this problem by performing object-centric alignment of the language embeddings from the CLIP model. Furthermore, we visually ground the objects with only image-level supervision using a pseudo-labeling process that provides high-quality object proposals and helps expand the vocabulary during training. We establish a bridge between the above two object-alignment strategies via a novel weight transfer function that aggregates their complimentary strengths. In essence, the proposed model seeks to minimize the gap between object and image-centric representations in the OVD setting.

2022: Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, F. Khan

https://arxiv.org/pdf/2207.03482v1.pdf

July 4, 2022  

Transformer in Transformer

Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT).

2021: Kai Han, An Xiao, E. Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang

Ranked #6 on Fine-Grained Image Classification on Oxford-IIIT Pets

https://arxiv.org/pdf/2103.00112v3.pdf

July 2, 2022  

Vision GNN: An Image is Worth Graph of Nodes

Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph level feature for visual tasks.

2022: Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, E. Wu

https://arxiv.org/pdf/2206.00272v1.pdf

June 30, 2022  

TorchGeo: deep learning with geospatial data

Remotely sensed geospatial data are critical for applications including precision agriculture, urban planning, disaster monitoring and response, and climate change research, among others. Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available.

2021: A. Stewart, Caleb Robinson, Isaac A. Corley, Anthony Ortiz, J. Ferres, Arindam Banerjee

https://arxiv.org/pdf/2111.08872v3.pdf

June 29, 2022  

OmniXAI: A Library for Explainable AI

We introduce OmniXAI 1 (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process.

2022: Wenzhuo Yang, Hung Le, S. Savarese, S. Hoi

https://arxiv.org/pdf/2206.01612v3.pdf

Podbean App

Play this podcast on Podbean App