Happy to share our new paper “Language model acceptability judgements are not always robust to context” https://arxiv.org/abs/2212.08979! We prepend several kinds of context to minimal linguistic #acceptability test pairs and find #LMs (#OPT, #GPT2) can still achieve strong performance on #BLiMP & #SyntaxGym, except in some interesting cases. 🧵 [1/7]
Joint work with @jon , @kanishka, @amuuueller, @keren fuentes, @roger_p_levy, @Adinawilliams