Ine Gevers (U Antwerp) at #CIDAS Colloquium: Thursday, Fbb 5th, 14:15
Playing with Knowledge: Evaluating Common Sense in LLMs Through Language Games
Large Language Models (#LLMs) today achieve strikingly high scores on different benchmarks, designed to test math, language understanding, or coding skills. Yet, these same models can exhibit surprising failures of common sense, like suggesting to use glue on your pizza if the cheese slips off. These mismatches highlight an open question: what do LLMs really understand about the world? Evaluating common sense knowledge in AI systems has been a longstanding challenge in NLP, spanning a wide collection of topics such as implicit language understanding, social or cultural norms, or everyday reasoning strategies.
1/2
