Papers Read on AI

Papers Read on AI header image 1
May 25, 2022  

Ivy: Templated Deep Learning for Inter-Framework Portability

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks such that their core functions all exhibit consistent call signatures, syntax and input-output behaviour. Ivy allows high-level framework-agnostic functions to be implemented through the use of framework templates.

2021: Daniel Lenton, Fabio Pardo, Fabian Falck, Stephen James, R. Clark

https://arxiv.org/pdf/2102.02886v3.pdf

May 25, 2022  

Self-attention Does Not Need O(n2) Memory

We provide a practical implementation for accelerators that requires O( √ n) memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention. We also demonstrate how to differentiate the function while remaining memory-efficient.

2021: Markus N. Rabe, Charles Staats

https://arxiv.org/pdf/2112.05682v2.pdf

May 24, 2022  

Vision Transformer Adapter for Dense Predictions

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike recent visual transformers that introduce vision-specific inductive biases into their architectures, ViT achieves inferior performance on dense prediction tasks due to lacking prior information of images. To solve this issue, we propose a Vision Transformer Adapter (ViT-Adapter), which can remedy the defects of ViT and achieve comparable performance to vision-specific models by introducing inductive biases via an additional architecture.

2022: Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Y. Qiao

Ranked #1 on Semantic Segmentation on ADE20K val

https://arxiv.org/pdf/2205.08534v2.pdf

May 20, 2022  

DeepNet: Scaling Transformers to 1, 000 Layers

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DEEPNORM) to modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded in a stable way. The proposed method combines the best of two worlds, i.e., good performance of Post-LN and stable training of Pre-LN, making DEEPNORM a preferred alternative. We successfully scale Transformers up to 1,000 layers (i.e., 2,500 attention and feed-forward network sublayers) without difficulty, which is one order of magnitude deeper than previous deep Transformers. Remarkably, on a multilingual benchmark with 7,482 translation directions, our 200-layer model with 3.2B parameters significantly outperforms the 48-layer state-of-the-art model with 12B parameters by 5 BLEU points, which indicates a promising scaling direction. Transformer-big BERT-large GPipe GPT-2 Roberta Megatron-LM T5 Turing-NLG GPT-3 GShard MT-NLG XLM-R XGLM GLaM Gopher DeepNet 0 200 400 600 80

2022: Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

https://arxiv.org/pdf/2203.00555v1.pdf

May 19, 2022  

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.

2022: Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

Ranked #1 on Few-Shot Text Classification on RAFT

https://arxiv.org/pdf/2205.05638v1.pdf

May 15, 2022  

CLIPort: What and Where Pathways for Robotic Manipulation

We present CLIPORT, a language-conditioned imitation learning agent that combines the broad semantic understanding of CLIP [1] with the spatial precision of Transporter. Our end-to-end framework is capable of solving a variety of language-specified tabletop tasks from packing unseen objects to folding cloths, all without any explicit representations of object poses, instance segmentations, history, symbolic states, or syntactic structures.

2021: Mohit Shridhar, Lucas Manuelli, D. Fox

https://arxiv.org/pdf/2109.12098v1.pdf

May 14, 2022  

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

We developed an open-source Python package, Gradio, which allows researchers to rapidly generate a visual interface for their ML models. Gradio makes accessing any ML model as easy as sharing a URL. Our development of Gradio is informed by interviews with a number of machine learning researchers who participate in interdisciplinary collaborations. Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

2019: Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, James Y. Zou

Machine learning, Accessibility, IPython, Subject-matter expert, Open-source software, Usability, As-Easy-As, Communication endpoint, Python

https://arxiv.org/pdf/1906.02569v1.pdf

May 13, 2022  

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

Text logo design heavily relies on the creativity and ex-pertise of professional designers, in which arranging element layouts is one of the most important procedures. How-ever, few attention has been paid to this task which needs to take many factors (e.g., fonts, linguistics, topics, etc.) into consideration. In this paper, we propose a content-aware layout generation network which takes glyph images and their corresponding text as input and synthesizes aesthetic layouts for them automatically. Specifically, we de-velop a dual-discriminator module, including a sequence discriminator and an image discriminator, to evaluate both the character placing trajectories and rendered shapes of synthesized text logos, respectively. Furthermore, we fuse the information of linguistics from texts and visual semantics from glyphs to guide layout prediction, which both play important roles in professional layout design. To train and evaluate our approach, we construct a dataset named as TextLogo3K, consisting of about 3,500 text logo images and their pixel-level annotations. Experimental studies on this dataset demonstrate the effectiveness of our approach for synthesizing visually-pleasing text logos and verify its su-periority against the state of the art.

2022: Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian

https://arxiv.org/pdf/2204.02701v1.pdf

May 12, 2022  

Language Models Can See: Plugging Visual Controls in Text Generation

In this work, we propose a training-free framework, called MAGIC (i MA ge- Guided text generation with C LIP), for plugging in visual controls in the generation process and enabling LMs to perform multimodal tasks (e.g., image captioning) in a zero-shot manner. MAGIC is a simple yet efficient plug-and-play framework, which directly combines an off-the-shelf LM (i.e., GPT-2) and an image-text matching model (i.e., CLIP) for image-grounded text generation.

2022: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, N. Collier

https://arxiv.org/pdf/2205.02655v1.pdf

May 11, 2022  

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and nat-ural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package F ederated S cope- G NN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities.

2022: Zhen Wang, Weirui Kuang, Yuexiang Xie, Liuyi Yao, Yaliang Li, Bolin Ding, Jingren Zhou

https://arxiv.org/pdf/2204.05562v3.pdf

Podbean App

Play this podcast on Podbean App