Papers Read on AI

Papers Read on AI header image 1
April 7, 2022  

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

We demonstrate instant training of neural graphics primitives on a single GPU for multiple tasks. In Gigapixel image we represent a gigapixel image by a neural network. SDF learns a signed distance function in 3D space whose zero level-set represents a 2D surface. Neural radiance caching (NRC) [Müller et al. 2021] employs a neural network that is trained in real-time to cache costly lighting calculations. Lastly, NeRF uses 2D images and their camera poses to reconstruct a volumetric radiance-and-density field that is visualized using ray marching. In all tasks, our encoding and its efficient implementation provide clear benefits: rapid training, high quality, and simplicity. Our encoding is task-agnostic: we use the same implementation and hyper parameters across all tasks and only vary the hash table size which trades off quality and performance.

2022: T. Müller, Alex Evans, Christoph Schied, A. Keller

https://arxiv.org/pdf/2201.05989v1.pdf

April 6, 2022  

SimSwap: An Efficient Framework For High Fidelity Face Swapping

We propose an efficient framework, called Simple Swap (SimSwap), aiming for generalized and high fidelity face swapping. In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes like facial expression and gaze direction, our framework is capable of transferring the identity of an arbitrary source face into an arbitrary target face while preserving the attributes of the target face. We overcome the above defects in the following two ways. First, we present the ID Injection Module (IIM) which transfers the identity information of the source face into the target face at feature level. By using this module, we extend the architecture of an identity-specific face swapping algorithm to a framework for arbitrary face swapping. Second, we propose the Weak Feature Matching Loss which efficiently helps our framework to preserve the facial attributes in an implicit way. Extensive experiments on wild faces demonstrate that our SimSwap is able to achieve competitive identity performance while preserving attributes better than previous state-of-the-art methods.

2020: Renwang Chen, Xuanhong Chen, Bingbing Ni, Yanhao Ge

Ranked #2 on Face Swapping on FaceForensics++

https://arxiv.org/pdf/2106.06340v1.pdf

April 5, 2022  

Sparse Instance Activation for Real-Time Instance Segmentation

In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. Previously, most instance segmentation methods heavily rely on object detection and perform mask prediction based on bounding boxes or dense centers.

2022: Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Q. Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu

Ranked #1 on Real-time Instance Segmentation on MSCOCO

https://arxiv.org/pdf/2203.12827v1.pdf

April 5, 2022  

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks.

2022: Junnan Li, Dongxu Li, Caiming Xiong, S. Hoi

Ranked #1 on Image Captioning on nocaps-val-out-domain

https://arxiv.org/pdf/2201.12086v2.pdf

April 4, 2022  

On-chip QNN: Towards Efficient On-Chip Training of Quantum Neural Networks

Quantum Neural Network (QNN) is drawing increasing research interest thanks to its potential to achieve quantum advantage on near-term Noisy Intermediate Scale Quantum (NISQ) hardware. In order to achieve scalable QNN learning, the training process needs to be offloaded to real quantum machines instead of using exponential cost classical simulators. One common approach to obtain QNN gradients is parameter shift whose cost scales linearly with the number of qubits. We present On-chip QNN, the first experimental demonstration of practical on-chip QNN training with parameter shift.

2022: Hanrui Wang, Zi-Chen Li, Jiaqi Gu, Yongshan Ding, D. Pan, Song Han

https://arxiv.org/pdf/2202.13239v1.pdf

April 3, 2022  

FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. Along with easily-reproducible tutorials, FinRL library allows users to streamline their own developments and to compare with existing schemes easily. Within FinRL, virtual environments are configured with stock market datasets, trading agents are trained with neural networks, and extensive backtesting is analyzed via trading performance. Moreover, it incorporates important trading constraints such as transaction cost, market liquidity and the investor's degree of risk-aversion. FinRL is featured with completeness, hands-on tutorial and reproducibility that favors beginners: (i) at multiple levels of time granularity, FinRL simulates trading environments across various stock markets, including NASDAQ-100, DJIA, SP (ii) organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility, and (iii) being highly extendable, FinRL reserves a complete set of user-import interfaces. Furthermore, we incorporated three application demonstrations, namely single stock trading, multiple stock trading, and portfolio allocation. The FinRL library will be available on Github at link this https URL.

2020: Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Chris Wang

https://arxiv.org/pdf/2011.09607v2.pdf

April 2, 2022  

Reference-based Video Super-Resolution Using Multi-Camera Video Triplets

We propose the first reference-based video superresolution (RefVSR) approach that utilizes reference videos for high-fidelity results. We focus on RefVSR in a triplecamera setting, where we aim at super-resolving a low resolution ultra-wide video utilizing wide-angle and telephoto videos. We introduce the first RefVSR network that recurrently aligns and propagates temporal reference features fused with features extracted from low-resolution frames.

2022: Junyong Lee, Myeong-Chun Lee, Sunghyun Cho, Seungyong Lee

Ranked #1 on Reference-based Video Super-Resolution on RealMCVSR Dataset

https://arxiv.org/pdf/2203.14537v1.pdf

April 1, 2022  

Global Tracking Transformers

We present a novel transformer-based architecture for global multi-object tracking. Our network takes a short sequence of frames as input and produces global trajectories for all objects. The core component is a global tracking transformer that operates on objects from all frames in the sequence. The transformer encodes object features from all frames, and uses trajectory queries to group them into trajectories. The trajectory queries are object features from a single frame and naturally produce unique trajectories.

2022: Xingyi Zhou, Tianwei Yin, V. Koltun, Phillip Krahenbuhl

Ranked #4 on Multi-Object Tracking on MOT17

https://arxiv.org/pdf/2203.13250v1.pdf

April 1, 2022  

A Dual Weighting Label Assignment Scheme for Object Detection

Label assignment (LA), which aims to assign each training sample a positive (pos) and a negative (neg) loss weight, plays an important role in object detection. Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight. Such a mechanism limits the learning capacity of detectors. In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately. We first identify the key influential factors of pos/neg weights by analyzing the evaluation metrics in object detection, and then design the pos and neg weighting functions based on them.

2022: Shuai Li, Chen-Hang He, Ruihuang Li, Lei Zhang

https://arxiv.org/pdf/2203.09730v1.pdf

March 31, 2022  

Sionna: An Open-Source Library for Next-Generation Physical Layer Research

SionnaTM is a GPU-accelerated open-source library for link-level simulations based on TensorFlow. It enables the rapid prototyping of complex communication system architectures and provides native support for the integration of neural networks. Sionna implements a wide breadth of carefully tested state-of-the-art algorithms that can be used for benchmarking and end-to-end performance evaluation. This allows researchers to focus on their research, making it more impactful and reproducible, while saving time implementing components outside their area of expertise. This white paper provides a brief introduction to Sionna, explains its design principles and features, as well as future extensions, such as integrated ray tracing and custom CUDA kernels. We believe that Sionna is a valuable tool for research on next-generation communication systems, such as 6G, and we welcome contributions from our community.

2022: J. Hoydis, Sebastian Cammerer, Fayccal Ait Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, Alexander Keller

https://arxiv.org/pdf/2203.11854v1.pdf

Podbean App

Play this podcast on Podbean App