Papers Read on AI

Papers Read on AI header image 1
September 21, 2022  

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Addi-tionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines. One use case is in for Industry-4.0

2022: S. Tuli, G. Casale, N. Jennings

https://arxiv.org/pdf/2201.07284v6.pdf

September 20, 2022  

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Adaptive gradient algorithms [1–4] borrow the moving average idea of heavy ball acceleration to estimate accurate first- and second-order moments of gradient for accelerating convergence. However, Nesterov acceleration which converges faster than heavy ball acceleration in theory [5] and also in many empirical cases [6] is much less investigated under the adaptive gradient setting. In this work, we propose the ADAptive Nesterov momentum algorithm, Adan for short, to effec-tively speedup the training of deep neural networks. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra computation and memory overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the first- and second-order moments of the gradient in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an (cid:15) -approximate first-order stationary point within O (cid:0) (cid:15) − 3 . 5 (cid:1) stochastic gradient complexity on the nonconvex stochastic problems ( e.g. deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan surpasses the corresponding SoTA optimizers on both CNNs and transformers, and sets new SoTAs for many popular networks and frameworks, e.g. ResNet [7], ConvNext [8], ViT [9], Swin [10], MAE [11], LSTM [12], TransformerXL [13] and BERT [14]. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable

2022: Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan

https://arxiv.org/pdf/2208.06677v2.pdf

September 19, 2022  

YOLOX-PAI: An Improved YOLOX, Stronger and Faster than YOLOv6

We develop an all-in-one computer vision tool-box named EasyCV to facilitate the use of various SOTA computer vision methods. Re-cently, we add YOLOX-PAI, an improved version of YOLOX, into EasyCV. We conduct ablation studies to investigate the influence of some detection methods on YOLOX. We also provide an easy use for PAI-Blade which is used to accelerate the inference process based on BladeDISC and TensorRT. Finally, we receive 42.8 mAP on COCO dateset within 1.0 ms on a single NVIDIA V100 GPU, which is a bit faster than YOLOv6. A simple but efficient predictor api is also designed in EasyCV to conduct end2end object detection. Codes and models are now available at: https://github. com/alibaba/EasyCV.

2022: Xinyi Zou, Ziheng Wu, Wenmeng Zhou, Jun Huang

https://arxiv.org/pdf/2208.13040v2.pdf

September 17, 2022  

Transformers are Sample Efficient World Models

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS , a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero.

2022: Vincent Micheli, Eloi Alonso, Franccois Fleuret

https://arxiv.org/pdf/2209.00588v1.pdf

September 16, 2022  

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3 - 5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion.

2022: Rinon Gal, Yuval Alaluf, Y. Atzmon, Or Patashnik, A. Bermano, G. Chechik, D. Cohen-Or

https://arxiv.org/pdf/2208.01618v1.pdf

September 14, 2022  

StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Computer graphics has experienced a recent surge of data-centric approaches for photorealistic and controllable content creation. StyleGAN in particular sets new standards for generative modeling regarding image quality and controllability. However, StyleGAN’s performance severely degrades on large unstructured datasets such as ImageNet. StyleGAN was designed for controllability; hence, prior works suspect its restrictive design to be unsuitable for diverse datasets. In contrast, we find the main limiting factor to be the current training strategy. Following the recently introduced Projected GAN paradigm, we leverage powerful neural network priors and a progressive growing strategy to successfully train the latest StyleGAN3 generator on ImageNet. Our final model, StyleGAN-XL, sets a new state-of-the-art on large-scale image synthesis and is the first to generate images at a resolution of 10242 at such a dataset scale. We demonstrate that this model can invert and edit images beyond the narrow domain of portraits or specific object classes.

2022: Axel Sauer, Katja Schwarz, Andreas Geiger

Ranked #1 on Image Generation on CIFAR-10

https://arxiv.org/pdf/2202.00273v1.pdf

September 6, 2022  

Explaining Explanations: Axiomatic Feature Interactions for Deep Networks

Recent work has shown great promise in explaining neural network behavior. In particular, feature attribution methods explain which features were most important to a model's prediction on a given input. However, for many tasks, simply knowing which features were important to a model's prediction may not provide enough insight to understand model behavior. The interactions between features within the model may better help us understand not only the model, but also why certain features are more important than others. In this work, we present Integrated Hessians, an extension of Integrated Gradients that explains pairwise feature interactions in neural networks. Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network. Additionally, we find that our method is faster than existing methods when the number of features is large, and outperforms previous methods on existing quantitative benchmarks. Code available at this https URL

2020: J. Janizek, Pascal Sturmfels, Su-In Lee

Interaction, Artificial neural network, Integrated circuit, Color gradient

https://arxiv.org/pdf/2002.04138v3.pdf

September 5, 2022  

Musika! Fast Infinite Waveform Music Generation

Fast and user-controllable music generation could enable novel ways of composing or performing music. However, state-of-the-art music generation systems require large amounts of data and computational resources for training, and are slow at inference. This makes them impractical for real-time interactive use. In this work, we introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU. We perform quantitative evaluations to assess the quality of the generated samples and showcase options for user control in piano and techno music generation. We release the source code and pretrained autoencoder weights at github.com/marcoppasini/musika, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.

2022: Marco Pasini, Jan Schlüter

https://arxiv.org/pdf/2208.08706v1.pdf

September 3, 2022  

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Addi-tionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines. One use case is in for Industry-4.0

2022: S. Tuli, G. Casale, N. Jennings

https://arxiv.org/pdf/2201.07284v6.pdf

September 2, 2022  

Representation Learning for the Automatic Indexing of Sound Effects Libraries

Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy up-dates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established representations such as OpenL3. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.

2022: Alison B. Ma, Alexander Lerch

https://arxiv.org/pdf/2208.09096v1.pdf

Podbean App

Play this podcast on Podbean App