Lmst

Back in the lecture hall again after two exciting weeks of #ESWC2025 and #ISWS2025. This morning, we introduced our students to RDF, RDFS, RDF Inferencing, and RDF Reification.

#ise2025 #semanticweb #semweb #knowledgegraphs #rdf #reasoning #reification #lecture @fiz_karlsruhe @fizise @KIT_Karlsruhe @sourisnumerique @tabea @enorouzi

Slide from the ISE 2025 lecture on Resource Descriptiuon Framework (RDF) as simple data model. The slide is showing a small knowledge graph, indicating that Climate Change was explained by Eunice Newton Foot in 1856 as well as by John Tyndall in 1859. To represent this n-ary (multi-valued) relations, we are using so-called blank nodes, representing an "explanation" each, which bundles the discoverer and the discovery date. This is done via "dereferencable" blank nodes here.

Last week, we continued our #ISE2025 lecture on distributional semantics with the introduction of neural language models (NLMs) and compared them to traditional statistical n-gram models.
Benefits of NLMs:
- Capturing Long-Range Dependencies
- Computational and Statistical Tractability
- Improved Generalisation
- Higher Accuracy

@fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi #llms #nlp #AI #lecture

The image illustrates the architecture of a Neural Language Model, specifically focusing on Word Vectors II - Neural Language Models. It is part of a presentation on Natural Language Processing, created by the Karlsruhe Institute of Technology (KIT) and FIZ Karlsruhe, as indicated by their logos in the top right corner.

The diagram shows a neural network processing an input word embedding, represented by the phrase "to be or not to." The input is transformed into a d-sized vector representation of the context "to be or not to" by a neural network. This vector is then passed through a linear layer, which maps the vector to a size equal to the vocabulary size (|V|), represented by a series of circles labeled "tokens." The output of the linear layer is fed into a softmax function, which generates a probability distribution over the next token, denoted as P(* | "to be or not to"). The diagram also includes annotations explaining the process, such as "get probability distribution of next token" and "context processing (previous tokens)." The overall layout is clear and educational, with arrows indicating the flow of information through the model.

Provided by @altbot, generated privately and locally using Ovis2-8B

sauron

#lordoftherings #sauron #sauroneye

#ise2025 #integratedsystemseurope

#light #lighting #artificiallight

#lamplovers #lamp #lampdesign
#lightingsolutions #lightingdesign

#sonya7ii #sonyalpha

#sigmalens #sigmaart #sigma2470art #sigma2470

#photography #darktableedit

In the #ISE2025 lecture today we were introducing our students to the concept of distributional semantics as the foundation of modern large language models. Historically, Wittgenstein was one of the important figures in the Philosophy of Language stating thet "The meaning of a word is its use in the language."

https://static1.squarespace.com/static/54889e73e4b0a2c1f9891289/t/564b61a4e4b04eca59c4d232/1447780772744/Ludwig.Wittgenstein.-.Philosophical.Investigations.pdf

#philosophy #wittgenstein #nlp #AI #llm #languagemodel #language #lecture @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #AIart

An AI-generated image of Ludwig Wittgenstein as a comic strip character. A speech bubble show his famous quote "The meaning of a word is its use in the language."
Bibliographical Reference: Wittgenstein, Ludwig. Philosophical Investigations, Blackwell Publishing (1953).
Ludwig Wittgenstein (1889–1951)

Generating Shakespeare-like text with an n-gram language model is straight forward and quite simple. But, don't expect to much of it. It will not be able to recreate a lost Shakespear play for you ;-) It's merely a parrot, making up well sounding sentences out of fragments of original Shakespeare texts...

#ise2025 #lecture #nlp #llm #languagemodel @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #shakespeare #generativeAI #statistics

Slide from the Information Service Engineering lecture 04, Natural Language Procerssing 03, 2.9 Language Models, N-Gram Shakespeare Generation.
The background of the slide shows an AI-generated portrait of William Shakespeare as an ink drawing. There are 4 speech-bubbles around Shakespeare's head, representing artificially generated text based on 1-grams, 2-grams, 3-grams and 4-grams: '
1-gram: To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have Hill he late speaks; or! a more to leg less first you enter.
2-gram: Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow. What means, sir. I confess she? then all sorts, he is trim, captain
3-gram: Fly, and will rid me these news of price. Therefore the sadness of parting, as they say,’tis done. This shall forbid it should be branded, if renown made it empty.
4-gram: King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in; It cannot be but so.

The magic happens somehow at 4-grams, basically because it IS Shakesperare ;-)

In our #ISE2025 lecture last Wednesday, we learned how in n-gram language models via Markov assumption and maximum likelihood estimation we can predict the probability of the occurrence of a word given a specific context (i.e. n words previous in the sequence of words).

#NLP #languagemodels #lecture @fizise @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @KIT_Karlsruhe

Slide from the Information Service Engineering 2025 lecture, 03 Natural Language Processing 02, 2.9, Language MOdels:
Title: N-Gram Language Model
The probability of a sequence of words can be computed via contitional probability and the Bayes Rule (including the chain rule for n words). Approximation is performed via Markov assumption (dependency only on the n last words), and the Maximum Likelihood estimation (approximating the probabilities of a sequence of words by counting and normalising occurrences in large text corpora).

This week, we were discussing the central question Can we "predict" a word? as the basis for statistical language models in our #ISE2025 lecture. Of course, I wasx trying Shakespeare quotes to motivate the (international) students to complement the quotes with "predicted" missing words ;-)

"All the world's a stage, and all the men and women merely...."

#nlp #llms #languagemodel #Shakespeare #AIart lecture @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #brushUpYourShakespeare

Slide from the Information Service Engineering 2025 lecture, Natural Language Processing 03, 2.10 Language Models. The Slide shows a graphical portrait of William Shakespeare (created by midjourney AI) as an ink sketch with yellow accents. The text states "Can we "predict" a word?"

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

https://github.com/ISE-FIZKarlsruhe/ISE-teaching/blob/b72690d38911b37748082256b61f96cf86171ace/materials/dataset/fouriercorpus.txt

#ise2025 #nlp #lecture #climatechange #globalwarming #historyofscience #climate @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique

Slide of the Information Service ENgineering lecture 03, Natural Language Processing 02, section 2.6: Evaluation, Precision, and Recall
Headline: Experiment
Let's consider the following text corpus (FOURIERCORPUS):
1
In 1807, Fourier's work on heat transfer laid the foundation for understanding the greenhouse effect.
2
Joseph Fourier's energy balance analysis showed atmosphere's heat-trapping role.
3
Fourrier's calculations, though rudimentary, suggested that the atmosphere acts as an insulator.
4
Fourier’s greenhouse effect explains how atmospheric gases influence global temperatures.
5
Jean-Baptiste Joseph Fourier's mathematical treatment of heat flow is essential to climate modeling.
6
Climate science acknowledges that Fourier helped to understand the atmospheric absorption of heat.
7
Climate change origins often cite Fourier's mathematical work on radiative heat.
8
J. Fourier published his "Analytical theory of heat" in 1822.
9
Fourier analysis is used in signal processing.
10
Fourier series are key in heat conduction math.
11
Fourier and related algebras occur naturally in the harmonic analysis of locally compact groups.
12
The Fourier number is the ratio of time to a characteristic time scale for heat diffusion.

The corpus is available at https://github.com/ISE-FIZKarlsruhe/ISE-teaching/blob/b72690d38911b37748082256b61f96cf86171ace/materials/dataset/fouriercorpus.txt

On the right side in the background is a portrait engraving of Joseph Fourier

Last leg on our brief history of NLP (so far) is the advent of large language models with GPT-3 in 2020 and the introduction of learning from the prompt (aka few-shot learning).

T. B. Brown et al. (2020). Language models are few-shot learners. NIPS'20

https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

#llms #gpt #AI #nlp #historyofscience @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #ise2025

Slide from Information System Engineering 2025 lecture, 02 - Natural Language Processing 01, A brief history of NLP, NLP Timeline.
The NLP timeline is in the middle of the page from top to bottom. The marker is at 2020. On the left side, an original screenshot of GPT-3 is shown, giving advise on how to present a talk about "Symbolic and Subsymbolic AI - An Epic Dilemma?".
The right side holds the following text:
2020: GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “data quality” predictor was trained to boil down the training data to 550GB “high quality” data. Learning from the prompt is introduced (few-shot learning)

Bibliographical Reference:
T. B. Brown et al. (2020). Language models are few-shot learners. In Proceedings of the 34th Int. Conf. on Neural Information Processing Systems (NIPS'20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 1877–1901.

Next stop in our NLP timeline is 2013, the introduction of low dimensional dense word vectors - so-called "word embeddings" - based on distributed semantics, as e.g. word2vec by Mikolov et al. from Google, which enabled representation learning on text.

T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space.
https://arxiv.org/abs/1301.3781

#NLP #AI #wordembeddings #word2vec #ise2025 #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi

Slide from the Information Service Engineering 2025 lecture, lecture 02, Natural Language Processing 01, NLP Timeline. The timeline is in the middle of the slide from top to bottom, indicating a marker at 2013. On the left, a diagram is shown, displaying vectors for "man" and "woman" in a 2D diagram. An arrow leades from the point of "man" to the point of "woman". Above it, there is also the point marked for "king" and the same difference vector is transferred from "man - > woman" to "king - ?" asking, what might be the appropriate completion.
Right of the timeline, the following text is displayed: Word2Vec neural network based framework to learn distributed representations of words as dense vectors in continuous space (word embeddings) was developed by Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean at Google.
These language models are based on the Distributional Hypothesis in linguistics i.e. words that are used and occur in the same contexts tend to purport similar meanings.

Bibliographical reference:
T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781

Building on the 90s, statistical n-gram language models, trained on vast text collections, became the backbone of NLP research. They fueled advancements in nearly all NLP techniques of the era, laying the groundwork for today's AI.

F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA

#NLP #LanguageModels #HistoryOfAI #TextProcessing #AI #historyofscience #ISE2025 @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Slide from Information Service Engineering 2025, LEcture 02, Natural Language PRocessing 01, A Brief History of NLP, NLP timeline. The timeline is located in the middle of the slide from top to bottom. The pointer on the timeline indicates 1990s. On the left, the formula for conditional probability of a word, following a given series of words, is given as a formula. Below, an AI generated portrait of William Shakespeare is displayed with 4 speech buubles, representing artificially generated text based on 1-grams, 2-grams, 3-grams and 4 grams. The 4-grams text example looks a lot like original Shakespeare text. On the right side the following text is displayed:
N-grams for statistical language modeling were introduced and popularised by Frederick Jelinek and Stanley F. Chen from IBM Thomas J. Watson Research Center, who developed efficient algorithms and techniques for estimating n-gram probabilities from large text corpora for speech recognition and machine translation.

Bibliographical reference:
F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA.

Next stop on our NLP timeline (as part of the #ISE2025 lecture) was Terry Winograd's SHRDLU, an early natural language understanding system developed in 1968-70 that could manipulate blocks in a virtual world.

Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.
http://dspace.mit.edu/bitstream/handle/1721.1/7095/AITR-235.pdf

#nlp #lecture #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi #AI

Slide from the Information Service Engineering 2025 lecture, Natural Language Processing 01, A Brief History of NLP, NLP Timeline. The picture depicts a timeline in the middle from top to bottom. There is a marker placed at 1970. Left of the timeline, a screenshot of the SHRDLU system is shown displaying a block world in simple line graphics. On the right side, the following text is displayed: SHRDLU was an early natural language understanding system developed by Terry Winograd in 1968-70 that could manipulate blocks in a virtual world. Users could issue commands like “Move the red block onto the green block,” and SHRDLU would execute the task accordingly. This demonstration highlighted the potential of NLP in understanding and responding to complex instructions.

Bibliographical references:
Winograd, Terry (1970-08-24). Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.

With the advent of ELIZA, Joseph Weizenbaum's first psychotherapist chatbot, NLP took another major step with pattern-based substitution algorithms based on simple regular expressions.

Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Com. of the ACM. 9: 36–45.

https://dl.acm.org/doi/pdf/10.1145/365153.365168

#nlp #lecture #chatbot #llm #ise2025 #historyofScience #AI @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Slide from the Information Service Enguneering 2025 lecture slidedeck, lecture 02, Natural language processing 01, Excursion: A Brief History of NLP, NLP timeline
On the right side of the image, a historic text terminal screenshot of a starting ELIZA dialogue is depicted. The timeline in the middle of the picture (from top to bottom) indicates the year 1966. The text left of the timeline says: ELIZA was an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum which simulated conversation giving users an illusion of understanding on the part of the program based on pattern matching and pre-scripted response templates.

Bibliographical reference:
Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM. 9: 36–45.

Next stop in our NLP timeline are the (mostly) futile tries of machine translation during the cold war era. The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. It’s major drawback was that absolutely everything had to be made explicit.

#nlp #historyofscience #ise2025 #lecture #machinetranslation #coldwar #AI #historyofAI @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @fizise

Slide from Information Service Engineering lecture 02, Natural Language Processing 1. Title: NLP Timeline
The indicated era on the timeline is 1954-1966. On the right side of the timeline, an AI generated picture of a military parade with mobile missiles in front of the Kremlin basilica is sketched, overlapped with the following machine translation example:
English: "The spirit was willing, but the flesh was weak". This sentence was automatically translated to Russian. Then, it was translated back again into English with the following result: "The vodka was good, but the meat was rotten."

The text left of the timeline says: 1954 - 1966
Futile cold-war motivated efforts in rule-based machine translation from Russian to English. The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. It’s major drawback was that everything had to be made explicit.

Bibliographical references:
John A.Kouwenhoven ‘The trouble with translation’ in Harper's Magazine, August 1962
and W. John Hutchins, Machine Translation: Past, Present, and Future, Longman Higher Education, 1985, p. 5.

Next step in our NLP timeline is Claude Elwood Shannon, who already laid the foundations for statistical language modeling by recognising the relevance of n-grams to model properties of language and predicting the likelihood of word sequences.

C.E. Shannon ""A Mathematical Theory of Communication" (1948) https://web.archive.org/web/19980715013250/http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

#ise2025 #nlp #lecture #languagemodel #informationtheory #historyofscience @enorouzi @tabea @sourisnumerique @fiz_karlsruhe @fizise

Slide from the Information Service ENgineering lecture 02, Natural Language Processing 01. Title: NLP Timeline.
A black & white portrait picture of Claude Elwood Shannon (1916-2001) is shown on the left side of a timeline marked with "1948". Shannon is depicted in front of an old 1950s "electronic" computer. The text on the right side of the timeline says: Claude Shannon proposed the idea of using n-grams as a means to analyse the statistical properties of language in "A Mathematical Theory of Communication" (1948). While Shannon's primary focus was on communication and information transmission, he recognised the relevance of n-grams in modeling language and predicting the likelihood of word sequences.

BIbliographical reference:
Shannon, Claude Elwood (July 1948). A Mathematical Theory of Communication, Bell System Technical Journal. 27 (3): 379–423.

We are starting #ISE2025 lecture 02 with a (very) brief history of #NLP pointing out only some selected highlights. Linguist Ferdinand de Saussure was laying the foundations of today's NLP by describing languages as “systems.” He argued that meaning is created inside language, in the relations and differences between its parts.

Course in general linguistics. https://ia600204.us.archive.org/0/items/SaussureFerdinandDeCourseInGeneralLinguistics1959/Saussure_Ferdinand_de_Course_in_General_Linguistics_1959.pdf

#linguistics #historyofscience @fiz_karlsruhe @fizise @enorouzi @tabea @sourisnumerique @KIT_Karlsruhe #AIFB

Slide from the ISE2025 lecture. Headline: NLP Timeline. On the left side a sepia-toned old portrait picture of Swiss linguist Ferdinand de Saussure (1857-1913) is shown. In the middle is a timeline depicted as a ray from top to bottom with an indicator at "1916". The text says: Ferdinand de Saussure, Professor at the University of Geneva, developed an approach describing languages as “systems.” Saussure argued that meaning is created inside language, in the relations and differences between its parts. A shared language system makes communication possible.
After his death in 1916, his colleagues Albert Sechehaye and Charles Bally published “Cours de Linguistique Générale” from Saussure’s manuscript notes and lecture notes from his students.

Bibliographical references:
Saussure, Ferdinand. Course in general linguistics. Eds. Charles Bally & Albert Sechehaye. Trans. Wade Baskin. NY: The Philosophical Society, 1959.

Today, the 2nd lecture of #ISE2025 took place with an introduction into Natural Language Processing, which will be subject of our lecture for the next 4 weeks.

#AI #nlp #informationextraction #ocr #ner #linguistics #computationallinguistics #morphology #pos #ambiguity #language @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #AIart #generativeAI #machinetranslation #languagemodels #llm

Cover slide of the slide deck presentation for the ISE 2025 lecture. It states: Information Service ENgineering, Lecture 2: Natural Language Processing 01, Prof. Dr. Harald Sack, FIZ Karlsruhe, AIFB, KIT Karlsruhe, Summer Semester 2025. It shows the two logos of FIZ Karlsruhe and KIT. In the background there is an AI-generated image of a (female) bald head connected to many wires forming a kind of graph network.

As knowledge and understanding were the main subjects of last week's first #ise2025 lecture, I was also introducing the semiotic triangle as in C.K. Ogden, I.A. Richards: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism, 1923.

#language #understanding #philosophy #nlp #linguistics #ontology #semiotics @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique

The image is a slide from a presentation titled "The Art of Understanding 1/4. The Art of Understanding Communication and Meaning." It features a diagram illustrating the relationship between a sender and a receiver using a tin can telephone, symbolizing communication. The diagram includes four key concepts: "Concept," "Symbol," "Object," and "Context," with arrows indicating their interconnections. The word "Jaguar" is used as an example, symbolizing a concept that refers to an object, with context influencing the meaning. The slide also includes images of a race car, a jaguar, and a computer desktop with Mac OS X, representing different contexts and symbols. The text "Pragmatics" is prominently displayed, emphasizing the practical aspects of language use.

What does it mean "to know" something? Have you ever thought about it? We tried to make our students think about it in this week's first #ise2025 lecture.

#kit200 #lecture #knowledge #philosophy #knowledgerepresentation #understanding #semweb #knowledgegraph #nlp @fiz_karlsruhe @fizise @enorouzi @sourisnumerique

The image features a digital representation of a human face composed of yellow and orange alphanumeric characters, set against a dark background. The face is centrally positioned, with the eyes closed, and the text "What is Knowledge?" prominently displayed at the top in large, bold, yellow font. Below the face, three statements are presented in white text: "I know that climate change is man made," "I believe that climate change is man made," and "It is true that climate change is man made." The background is filled with vertical streams of yellow alphanumeric characters, resembling a digital rain effect, which adds a technological and data-driven atmosphere to the image. The overall color scheme is dark with contrasting yellow and white text, emphasizing the central theme of knowledge and its relationship to climate change.

Provided by @altbot, generated privately and locally using Ovis2-8B

One of the central topics discussed in today's first ISE 2025 lecture is "Knowledge". How can we define knowledge? How does it differ from data, information, or wisdom? How does the process of "understanding" work? Welcome to "The Art of Understanding", which is the title of this lecture...

#ise2025 #semweb #semanticweb #AI #nlp #philosophy #lecture @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

#ise2025

Client Info