Why do Nearest Neighbor Language Models Work?

Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and speciﬁcally why k -nearest neighbor language models ( k NN-LMs) perform better than standard parametric LMs, even when the k -nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. 2023: Frank F. Xu, Uri Alon, Graham Neubig https://arxiv.org/pdf/2301.02828v1.pdf

Comment (0)

No comments yet. Be the first to say something!