Lmst

Are LM more than their behavior? 🤔

Join our Conference on Language Modeling (COLM) workshop and explore the interplay between what LMs answer and what happens internally ✨

See you in Montréal 🍁

CfP: shorturl.at/sBomu
Page: shorturl.at/FT3fX
Reviewer Nomination: shorturl.at/Jg1BP

#nlproc #interpretability

Call for Papers, Interplay Workshop at COLM: June 23rd - submissions due. July 24th - acceptance notification. October 10th - workshop day.

His work was supervised by Prof. Dr. Iryna Gurevych and Dr. Nils Reimers. The examination committee included Prof. Dr. Carsten Binnig (chair), Prof. Dr. Dan Roth (University of Pennsylvania) (co-reviewer) and Prof. Dr. Kristian Kersting.

Kexin has contributed significantly to the UKP Lab’s research and we are very excited to follow his next steps in academia or industry.

Congratulations again, Dr. Wang!

📚 Google Scholar Profile
https://scholar.google.com/citations?user=3gqLwNUAAAAJ&hl=en

#UKPLab #PhDDefense #NLProc #TUDA

🎓 𝗖𝗼𝗻𝗴𝗿𝗮𝘁𝘂𝗹𝗮𝘁𝗶𝗼𝗻𝘀 𝘁𝗼 𝗗𝗿. 𝗞𝗲𝘅𝗶𝗻 𝗪𝗮𝗻𝗴!

We are delighted to share that Kexin Wang has successfully defended his PhD thesis today at the Department of Computer Science, TU Darmstadt.

In his dissertation titled “Improving Dense Retrieval on Domain Adaptation and Decontextualization”, Kexin explored innovative methods to enhance dense retrieval systems—especially in settings where domain shift and decontextualization pose major challenges.
(1/🧵)

And consider following the authors Jingcheng Niu, Subhabrata Dutta, Ahmed Elshabrawy, @harish (University of Bath‬) & Iryna Gurevych if you are interested in more information or an exchange of ideas.

4⃣ Finally, by investigating the internals of the LLMs 👩‍🔧 we find that

👉the development of ICL on downstream tasks during pre-training correlates with the model learning to use a common subspace across tasks📠

👉and that both developments follow a predictable pattern 📈✨

5/🧵

3⃣ ICL performance depends on how difficult a task is!

More distractors (more complex) → Lower Performance 😶

4/🧵

2⃣ ICL performance is NOT emergent 🛑

We observe that ICL development is both
➡️GRADUAL and
➡️PREDICTABLE
in complex tasks

3/🧵

1⃣ Olsson et al.’s (2022) work on ICL and Induction Heads (arxiv.org/abs/2209.11895) suggested that LLMs can follow patterns presented in-context using random token orderings:

[A][B] ... [A]→[B]

⛔️BUT

💡We show that frequency matters!

More frequent → Better ICL 🤨

2/🧵

Is In-Context Learning (ICL) in LLMs Memorisation? Emergence? Some Algorithmic Capability? 🤔

📢New work exploring ICL in LLMs: arxiv.org/abs/2505.11004

💡Key Finding:
ICL capabilities are linked to token frequency 🤨

Strap in for the unexpected 🤯

A 🧵👇 #NLProc

Read the full guest article on page 3 (in German):
👉 www.tu-darmstadt.de/media/daa_responsives_design/01_die_universitaet_medien/aktuelles_6/publikationen_km/hoch3/pdf/hoch3_2025_2.pdf

(2/2)

#UKPLab #LLMs #Reasoning #DeepSeek #AIResearch #TUDarmstadt

🧠 How good are today’s LLMs at reasoning – really?

In the latest issue of hoch³, the university magazine of TU Darmstadt, Prof. Iryna Gurevych and Irina Bigoulaeva from the UKP Lab examine the performance of DeepSeek-R1 and R1-Zero.

Their takeaway? Even cutting-edge models struggle with atypical question formats that deviate from standard training data – highlighting ongoing challenges in robustness and generalization for generative AI.
(1/🧵 )

His time at UKP Lab played an important role in shaping his interdisciplinary approach and his enduring interest in the more playful and puzzling sides of language.

Stay tuned as Tristan shares insights into his current projects and what he took with him from his time at UKP Lab.
(3/3)

Today, Tristan leads the Computational Linguistics at Manitoba (CLAM) Lab at the University of Manitoba’s Department of Computer Science. He is best known for his work on the computational analysis of puns, jokes, and other forms of linguistic creativity.
(2/🧵)

We launched Spotlight at #UKPLab Alumni and are happy to continue with @Logological!

Tristan was part of UKP Lab from 2011 to 2019, working on lexical semantics, especially figurative language and wordplay. This laid the groundwork for a career at the intersection of linguistics and humor.
(1/🧵)

And consider following the authors Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych and Alham Fikri Aji if you are interested in more information or an exchange of ideas. (6/6)

See you in Albuquerque 🌵! #NAACL2025

Statement-Tuning delivers powerful, data-efficient NLU — ideal for low-resource settings. Dive into our paper and code for all the details!

📄 Paper: https://arxiv.org/abs/2404.12897
💻 Code: https://github.com/afz225/statement-tuning

(5/🧵)

Remarkably, using as few as 1K statements per training task (just 16K examples in total) yields 96% of optimal performance, which is both efficient and robust. For generalization, increasing training task diversity is more effective than increasing data size per task.
(4/🧵)

Our experiments show that multi-task, statement-tuned encoders rival SOTA LLMs—with up to 200× fewer parameters!—across 7 zero-shot NLU tasks.

We also outperform previous lightweight methods on accuracy, robustness to spurious patterns and supported tasks!🚀 (3/🧵)

Any discriminative task with a finite set of targets can be verbalized into statements. We fine-tune RoBERTa to classify statements as True or False. Just like LLM instruction-tuning, multi-task statement-tuning improves 0-shot generalization to unseen tasks. (2/🧵)

🚀 𝗦𝗢𝗧𝗔 𝟬-𝘀𝗵𝗼𝘁 𝗺𝗼𝗱𝗲𝗹𝘀 𝗴𝗿𝗼𝘄𝗶𝗻𝗴? 𝘿𝙤𝙣'𝙩 𝙗𝙧𝙚𝙖𝙠 𝙩𝙝𝙚 𝙘𝙤𝙢𝙥𝙪𝙩𝙚-𝙗𝙖𝙣𝙠! ⚡💡

Discover Statement-Tuning in our #NAACL2025 paper: we transform NLU tasks into natural language statements, letting small models like RoBERTa shine ✨ in zero & few-shot settings at a fraction of the cost. 🔥
(1/🧵)

#NLProc

Client Info