#LanguageModels

2025-05-17

In our #ISE2025 lecture last Wednesday, we learned how in n-gram language models via Markov assumption and maximum likelihood estimation we can predict the probability of the occurrence of a word given a specific context (i.e. n words previous in the sequence of words).

#NLP #languagemodels #lecture @fizise @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @KIT_Karlsruhe

Slide from the Information Service Engineering 2025 lecture, 03 Natural Language Processing 02, 2.9, Language MOdels:
Title: N-Gram Language Model
The probability of a sequence of words can be computed via contitional probability and the Bayes Rule (including the chain rule for n words). Approximation is performed via Markov assumption (dependency only on the n last words), and the Maximum Likelihood estimation (approximating the probabilities of a sequence of words by counting and normalising occurrences in large text corpora).
2025-05-13

🧠 🤖 Researchers from the Natural Language Processing Laboratory and NeuroAI Laboratory have discovered key ‘units’ in large AI models that seem to be important for language, mirroring the brain’s language system. When these specific units were turned off, the models got much worse at language tasks.

#LanguageModels #ArtificialIntelligence #AIResearch

Read more: go.epfl.ch/LJx-en

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-05-12

How can #LanguageModels optimize tradeoffs between performance and inference costs?

Meta-reasoner's "contextual multi-armed bandits" made the best trades on #math and #logic tasks by iteratively checking opportunities to redirect, correct, and optimize.

doi.org/10.48550/arXiv.2502.19

Pages 1 and 2 (with highlights)Pages 3 and 4 (with highlights)Pages 5 and 6 (with highlights)Pages 7 and 8 (with post's result highlighted and annotated).
2025-05-09

Ever wondered what really makes those powerful AI language models tick? 🤔 Andrej Karpathy offers a clear explanation, revealing the secrets behind their training and architecture. Discover how they're evolving and the key security hurdles we need to overcome. A must-read for anyone curious about the behind-the-scene aspects AI! alanbonnici.com/2025/05/demyst #ArtificialIntelligence #NLP #LanguageModels #AISecurity #TechInsights #FutureofAI #TTMO

2025-05-09

Building on the 90s, statistical n-gram language models, trained on vast text collections, became the backbone of NLP research. They fueled advancements in nearly all NLP techniques of the era, laying the groundwork for today's AI.

F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA

#NLP #LanguageModels #HistoryOfAI #TextProcessing #AI #historyofscience #ISE2025 @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Slide from Information Service Engineering 2025, LEcture 02, Natural Language PRocessing 01, A Brief History of NLP, NLP timeline. The timeline is located in the middle of the slide from top to bottom. The pointer on the timeline indicates 1990s. On the left, the formula for conditional probability of a word, following a given series of words, is given as a formula. Below, an AI generated portrait of William Shakespeare is displayed with 4 speech buubles, representing artificially generated text based on 1-grams, 2-grams, 3-grams and 4 grams. The 4-grams text example looks a lot like original Shakespeare text. On the right side the following text is displayed: 
N-grams for statistical language modeling were introduced and popularised by Frederick Jelinek and Stanley F. Chen from IBM Thomas J. Watson Research Center, who developed efficient algorithms and techniques for estimating n-gram probabilities from large text corpora for speech recognition and machine translation.

Bibliographical reference:
F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA.
Hacker Newsh4ckernews
2025-05-08
2025-05-08

The project "Me, Myself, and AI" examines the situational understanding of advanced language models by applying a detailed array of over 13,000 questions. This initiative tests their capacity to identify themselves and comply with directives.

Discover more at situational-awareness-dataset..

#AIResearch #LanguageModels #SituationalAwareness

Erika Varis Doggetterikavaris@mas.to
2025-04-30

At #NAACL this week and I’m delighted to see the name change to “Nations of the Americas” as well as the special theme for this year of multi- and cross-culturalism in #NLP.

#NLProc #AI #LLMs #LanguageModels #CompLing #ComputationalLinguistics

2025-04-30

Today, the 2nd lecture of #ISE2025 took place with an introduction into Natural Language Processing, which will be subject of our lecture for the next 4 weeks.

#AI #nlp #informationextraction #ocr #ner #linguistics #computationallinguistics #morphology #pos #ambiguity #language @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #AIart #generativeAI #machinetranslation #languagemodels #llm

Cover slide of the slide deck presentation for the ISE 2025 lecture. It states: Information Service ENgineering, Lecture 2: Natural Language Processing 01, Prof. Dr. Harald Sack, FIZ Karlsruhe, AIFB, KIT Karlsruhe, Summer Semester 2025. It shows the two logos of FIZ Karlsruhe and KIT. In the background there is an AI-generated image of a (female) bald head connected to many wires forming a kind of graph network.
N-gated Hacker Newsngate
2025-04-28

✨🤖 Ah, behold the latest in buzzword salad: "DeepSeek-R2", where the AI language models promise to lead us into a brave new world of vague "solutions" and existential dread. Because who wouldn't want a computer to understand the futility of human existence better than we do? 🧐💡
deepseek.ai/blog/deepseek-r2-a

N-gated Hacker Newsngate
2025-04-28

🚀Somebody decided that tuning the knobs on large language models wasn't enough, so they invented "Inference-Aware Fine-Tuning for Best-of-N Sampling"—because that's what the world needed, more jargon. 🙄 Meanwhile, our brains are staggering under the weight of acronyms, wondering if the Simons Foundation can fund a cure for their strain.💡
arxiv.org/abs/2412.15287

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-04-21

Can popular, generalist #LLMs answer questions as specialists?

Adopting each step of #diagnosis into a #ChainOfThought prompt made small and large #languageModels' outperform both zero-shot and the fine-tuned OLAPH method on the #MedLFQA benchmark.

doi.org/10.48550/arXiv.2503.03 #AI

Structured Outputs Enable General-Purpose LLMs to be Medical Experts, pages 1 and 2.Pages 2 and 8Pages 14 and 15Pages 12 and 13
Marco Siccardimsicc@msicc.social
2025-04-15

💻 Apple is taking #privacy seriously- also for training their #AI #languageModels.

In this new blog post, the #ML team outlines how they plan to get synthetic training #data without compromising privacy

machinelearning.apple.com/research/…

N-gated Hacker Newsngate
2025-04-14

🚀 Oh, look! Yet another 'groundbreaking' platform trying to democratize AI by letting anyone and everyone play with large language models... as long as they're willing to pretend Python isn't a thing. 🤦‍♂️ Blessed by the almighty , because nothing screams innovation like clunky open-source projects with dreams of world domination. 🌍✨
transformerlab.ai/

Hacker Newsh4ckernews
2025-04-07

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

arxiv.org/abs/2504.01157

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst