Papers Read on AI

Papers Read on AI header image 1
June 29, 2022  

OmniXAI: A Library for Explainable AI

We introduce OmniXAI 1 (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process.

2022: Wenzhuo Yang, Hung Le, S. Savarese, S. Hoi

https://arxiv.org/pdf/2206.01612v3.pdf

June 29, 2022  

Demystifying MMD GANs

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic.

2018: Mikolaj Binkowski, Danica J. Sutherland, M. Arbel, A. Gretton

Gradient, MikuMikuDance

https://arxiv.org/pdf/1801.01401v5.pdf

June 28, 2022  

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Fur-thermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

2021: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harrison Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, F. Such, D. Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William H. Guss, Alex Nichol, I. Babuschkin, S. Balaji, Shantanu Jain, A. Carr, J. Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, M. Knight, Miles Brundage, Mira Murati, Katie Mayer, P. Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba

https://arxiv.org/pdf/2107.03374v2.pdf

June 14, 2022  

SNUG: Self-Supervised Neural Dynamic Garments

We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. State-of-the-art data-driven approaches to model 3D garment deformations are trained using supervised strategies that require large datasets, usually obtained by expensive physics-based simulation methods or professional multi-camera capture setups.

2022: I. Santesteban, M. Otaduy, D. Casas

https://arxiv.org/pdf/2204.02219v1.pdf

June 10, 2022  

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

In this work, we describe GPT-NeoX-20B’s architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights. A 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission.

2022: Sid Black, Stella Rose Biderman, Eric Hallahan, Quentin G. Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, M. Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, J. Tow, Ben Wang, Samuel Weinbach

Ranked #7 on Multi-task Language Understanding on MMLU

https://arxiv.org/pdf/2204.06745v1.pdf

June 9, 2022  

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation

In this paper we introduce BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multimodal features in the shared bird’s-eye view (BEV) representation space, which nicely preserves both geometric and semantic information.

2022: Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, D. Rus, Song Han

Ranked #1 on 3D Object Detection on nuScenes

https://arxiv.org/pdf/2205.13542v1.pdf

June 8, 2022  

There is No Data Like More Data - Current Status of Machine Learning Datasets in Remote Sensing

Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments.

2021: M. Schmitt, S. A. Ahmadi, R. Hänsch

https://arxiv.org/pdf/2105.11726v2.pdf

June 8, 2022  

Hopular: Modern Hopfield Networks for Tabular Data

We suggest “Hopular”, a novel Deep Learning architecture for medium and small sized datasets, where each layer is equipped with continuous modern Hopfield networks. The modern Hopfield networks use stored data to identify feature-feature, feature-target, and sample-sample dependencies. Hopular’s novelty is that every layer can directly access the original input as well as the whole training set via stored data in the Hopfield networks. Hopular outperforms XGBoost, CatBoost, LightGBM and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data.

2022: Bernhard Schafl, Lukas Gruber, Angela Bitto-Nemling, S. Hochreiter

Ranked #1 on General Classification on Shrutime

https://arxiv.org/pdf/2206.00664v1.pdf

June 7, 2022  

Pretraining is All You Need for Image-to-Image Translation

We propose to use pretraining to boost general image-to-image translation. Prior image-to-image translation methods usually need dedicated architectural design and train individual translation models from scratch, struggling for high-quality generation of complex scenes, especially when paired training data are not abundant. In this paper, we regard each image-to-image translation problem as a downstream task and introduce a simple and generic framework that adapts a pretrained diffusion model to accommodate various kinds of image-to-image translation. We also propose adversarial training to enhance the texture synthesis in the diffusion model training, in conjunction with normalized guidance sampling to improve the generation quality.

2022: Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen

https://arxiv.org/pdf/2205.12952v1.pdf

June 1, 2022  

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.

2022: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. S. Mahdavi, Raphael Gontijo Lopes, Tim Salimans, Jonathan Ho, D. Fleet, Mohammad Norouzi

Ranked #1 on Text-to-Image Generation on COCO (using extra training data)

https://arxiv.org/pdf/2205.11487v1.pdf

Podbean App

Play this podcast on Podbean App