Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and speciﬁcally why k -nearest neighbor language models ( k NN-LMs) perform better than standard parametric LMs, even when the k -nearest neighbor component retrieves examples from the same training set that the LM was originally trained on.
2023: Frank F. Xu, Uri Alon, Graham Neubig
To leave or reply to comments, please download free Podbean or
To leave or reply to comments, please download free Podbean App.