Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks. However, existing methods such as BERT model a single document, and do not capture dependencies or knowledge that span across documents. In this work, we propose LinkBERT, an LM pretraining method that leverages links between documents, e.g., hyperlinks.
2022: Michihiro Yasunaga, J. Leskovec, Percy Liang
Ranked #1 on Text Classification on BLURB