Lmst

#ise2023

Last leg in our brief #timeline of (Large) #languagemodels (so far) is 2023, which saw the advent of many new and updated #LLMs:
- BARD #chatbot is introduced by Google
- LLaMA is introduced by Meta
- GPT-4 is introduced by OpenAI.
- LLaMA2.0 is introduced by Meta
- and many others...
#ISE2023 #lecture slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
GPT-4 tech report: https://arxiv.org/pdf/2303.08774
@fizise @KIT_Karlsruhe #ai #artificialintelligence #llm #llms #gpt #openai #llama #lamda #bard

Slide from Information Service Engineering 2023 lecture, Brief History of (Large) Language Models, 2023: BARD is introduced by Google, a chatbot based on the Google LaMDA language model.
- LLaMA is introduced by Meta (LLaMA 65B trained on 1.4T tokens)
- GPT-4 is introduced by OpenAI.
- LLaMA2.0 is introduced by Meta (LLaMA 70B trained on 2T tokens)
BIbliography: OpenAI: GPT-4 - Technical Report, arXiv:2303.08774 [cs.CL]

Next stop on our brief #timeline of (Large) #LanguageModels is 2022:
InstructGPT is introduced by OpenAI, a GPT-3 model complemented and fine-tuned with reinforcement learning from human feedback.
ChatGPT is introduced by OpenAI as a combination of GPT-3, Codex, and InstructGPT including lots of additional engineering.
#ise2023 lecture slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#RLHF explained: https://huggingface.co/blog/rlhf
#ai #creativeai #rlhf #gpt3 #gpt #openai #chatgpt #lecture #artificialintelligence #llm

Slide from Information Service Engineering 2023 lecture, BRief History of (Large) Language Models:
2022, InstructGPT is introduced by OpenAI, a GPT-3 model complemented and fine-tuned with reinforcement learning feedback. Improved instruction following and less likely producing hallucinated answers.
ChatGPT introduced by OpenAI, a combination of GPT-3, Codex, and InstructGPT plus a massive engineering effort.
Bibliography: N. Lambert, L. Castricato. L. von Werra, A. Havrilla. (2022). Illustrating Reinforcement Learning from Human Feedback (RLHF). huggingface.co.

Next stop in our brief #timeline of (large) #languagemodels is 2021:
DALL-E is released by OpenAI and raises text2img to a new level.
Codex is released by OpenAI able to translate natural language into programming code.
WebGPT is released by OpenAI for answering open-ended questions.
LaMDA is introduced by Google.
Slides from #ise2023 #lecture: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
Codex paper: https://arxiv.org/abs/2107.03374
DALL-E paper: https://arxiv.org/abs/2102.12092
@fizise #ai #generativeAI #GPT #dalle #openai
#lamda

Slide from Information Service ENgineering 2023 lecture, A Brief history of (Large) Language Models:
"2021:
DALL-E is released by OpenAI, a 12B parameter version of GPT-3 trained to generate images from text descriptions.
GitHub Co-Pilot, an AI pair-programmer for coding.
Codex released by OpenAI, able to translate natural language into programming code based on 159GB code and documentation.
WebGPT released by OpenAI, a fine-tuned GPT-3 for answering open-ended questions with citations and links to sources.
LaMDA (Language Model for Dialogue Application) introduced by Google."
Bibliography:
Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. ArXiv, abs/2107.03374
Ramesh, A.et al. (2021). Zero-Shot Text-to-Image Generation. ArXiv, abs/2102.12092.

Next leg in our brief history of (Large) #LanguageModel is 2020, when #GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “data quality” predictor was trained to boil down the training data to 550GB “high quality” data. Learning from the prompt was introduced (few-shot learning)
Lecture slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
paper: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
@fizise #ai #artificialintelligence #creativeai #llm #ise2023 #lecture

Lecture SLide from Information Sertvice Engineering 2023, A Brief HIstory of (Large) Language Models: GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “data quality” predictor was trained to boil down the training data to 550GB “high quality” data. Learning from the prompt is introduced (few-shot learning).
Bibliography: T. B. Brown et al. (2020). Language models are few-shot learners. In Proceedings of the 34th Int. Conf. on Neural Information Processing Systems (NIPS'20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 1877–1901.

Next stop in our Brief History of (Large) #languagemodels is 2019: GPT-2 was released by OpenAI as a direct scale-up of GPT, comprising 1.5B parameters and trained on 8M web pages.
Slides (from #ise2023 lecture): https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
Paper: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
#llm #llms #ai #artificialintelligence #generativeai #gpt #lecture #historyofAI

Slides from the lecture Information Service Engineering 2023, Brief History of Large Language Models: 2019, GPT-2 was released by OpenAI as a direct scale-up of GPT, comprising 1.5B parameters and trained on 8M web pages.
Bibliography: Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.

@bsletten The #ise2023 summer lecture was not recorded. The intention was to bring students back to university lecture halls. There is a 80% overlap with the #ise2021 lecture which is already on #youtube. However, if you are interested in our latest lecture on #knowledgegraphs, don't miss the (free) #KG2023 online course "Knowledge Graphs - Foundations and Applications" on #OpenHPI which starts in Oct 2023.
https://open.hpi.de/courses/knowledgegraphs2023
@fizise @tabea @sashabruns @Hasso_Plattner_Institute

Sample from the online course video of our free Knowledge Graphs - Foundations and Applications KG2023 OpenHPI lecture, which will be broadcasted in October 2023. In the video player you can see my colleague Ann Tan and me (Harald Sack) . The cover slide on the right is on knowledge graph embeddings and shows a Japanese woodcut in the style of HIroshige created via ArtBot stableDiffusion.

Next leg in our brief history of (Large) #language models is then advent of the first (real) pretrained #LLMs: ElMO (Allan Institute for AI, 2017), GPT (OpenAI, 2018) and BERT (Google, 2018).
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
ELMO: https://aclanthology.org/N18-1202.pdf
GPT: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
BERT: https://aclanthology.org/N19-1423
#AI @fizise #languagemodel #nlp #artificialintelligence #generativeai #lecture #ise2023

Slide from ISE 2023 lecture, Brief History of LLMs:
2017, Elmo (Embeddings from Language Models)
developed by Allen Institute for AI, trained on 5.5B words, 1B parameters
2018, Generative Pre-trained Transformers (GPT) introduced by OpenAI, a pre-trained language model that can be fine-tuned for various NLP applications. GPT was trained on 1.1GB text (book corpus).
2018, Bidirectional Encoder Representations from Transformers (BERT), a family of pre-trained language models for NLP tasks is released by Google. BERT-Large comprises 340M parameters, pre-trained on book corpus and English Wikipedia.
Bibliography:
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. (2018). Deep Contextualized Word Representations. In Proc of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 2227–2237, ACL.
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training.
J. Devlin, M. Chang, K. Lee, and K. Toutanova. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc of the 2019 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1, pp. 4171–4186, ACL.

Next stop in our brief Timeline of (large) #languagemodels from the #ise2023 lecture is the advent of the Graphical Processing Units #gpu. In 1999 Nvidias GeForce 256 was one of the very first, which enabled highly parallel computations for #neuralnetworks
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
@fizise #artificialintelligence #lecture #ai #machinelearning #llm

Slide from the Information Service Engineering 2023 lecture, A Brief History of Large Language Models: "NVIDIA introduced the first Graphical Processing Unit (GPU) card Nvidia Geforce 256". Bibliography: John Peddie, Famous Graphics Chips: Nvidia’s GeForce 256, IEEE Computer Society.
Link: https://www.computer.org/publications/tech-news/chasing-pixels/nvidias-geforce-256

1997 with the advent of Long Short-Term Memory recurrent #neuralnetworks marks the subsequent step in our brief history of )large) #languagemodels from last week's #ise2023 lecture. Introduced by Sepp Hochreiter and Jürgen Schmidhuber #LSTM #RNNs enabled efficient processing of sequences of data.
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#nlp #llm #llms #ai #artificialintelligence #lecture @fizise

Slide from Information Service Engineering 2023 Lecture, A Brief History of Large Language Models. "ong Short-Term Memory (LSTM) Recurrent Neural Networks are introduced by Sepp Hochreiter and Jürgen Schmidhuber which efficiently enabled the processing of sequences of data (instead of single data points) able to learn from data and to generate text." Depicted is a schematic view of an LSTM.
Bibliography:
Hochreiter, Sepp; Schmidhuber, Juergen (1996). LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems, pp. 473–479.
Link: https://dl.acm.org/doi/10.5555/2998981.2999048

Next step in our brief timeline of (large) #languagemodels from our #ise2023 lecture was statistical language modeling with n-grams based on large text corpora as introduced and popularized by Frederick Jelinek and Stanley F. Chen using statistical tricks like Bayes Theorem, Markov Assumption, and Maximum Likelihood Estimation, etc.
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
@fizise #nlp #llm #llms #artificialintelligence #ai #lecture #creativeAI

Slide from the Information Service Engineering 2023 lecture. Brief Timeline of (Large) Language Models about statistical N-gram models.
"1990s: N-grams for statistical language modeling were introduced and popularized by Frederick Jelinek and Stanley F. Chen from IBM Thomas J. Watson Research Center, who developed efficient algorithms and techniques for estimating n-gram probabilities from large text corpora for speech recognition and machine translation." Furthermore, the formula to compute the conditional probability of an n-gram is given with a depiction of the N-gram Shakespeare generator with 1-grams, 2-grams, 3-grams and 4-grams. The picture of WIlliam Shakespeare has been created via ArtBot.
Bibliography: F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA.

Went to the university office to collect the #ise2023 final exams for later reviewing together with the @fizise ta team. But for now, a #perfectEspresso at home… because it was too rainy today for walking to Espresso Stazione ☕️ #lecture #coffeechallenge #karlsruhe @KIT_Karlsruhe

A cup of Espresso placed on a mirroring black surface (in fact an oven) which is mirroring parts of a window

Slide 2 of our Brief Timeline for (Large) #LanguageModels from the last #ise2023 lecture introduced us to #ELIZA, Joseph Weizenbaum's simple #Chatbot from 1966 that simulates a conversation with a psychoanalyst. Weizenbaum was shocked that some persons including his secretary attributed human-like feelings to the computer program...
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#nlp #ai #llm #artificialintelligence @fizise

Slide 2 from the last Information Service Engiuneering 2023 lecture, with a brief history of Large Language Models:
ELIZA was an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum which simulated conversation giving users an illusion of understanding on the part of the program.
Bibliography: Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM. 9: 36–45.

One of the final sections of the #ise2023 lecture was an excursion with a #timeline of (Large) #LanguageModels. We started our tour in 1948 with Claude Shannon's seminal work "A Mathematical Theory of Communication""
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
@fizise #llm #ai #nlp #artificialintelligence #informationtheory #lecture

Slide from last week's Information Service Engineering 2023 lecture, A Timeline for Large Language Models, Depicting a portrait photography of Claude Elwood Shannon in front of a 1950s mainframe computer.
Text: "Claude Shannon proposed the idea of using n-grams as a means to analyze the statistical properties of language in "A Mathematical Theory of Communication" (1948). While Shannon's primary focus was on communication and information transmission, he recognized the relevance of n-grams in modeling language and predicting the likelihood of word sequences."
Bibliography:
Shannon, Claude Elwood (July 1948). A Mathematical Theory of Communication, Bell System Technical Journal. 27 (3): 379–423.

As a 2nd topic of this last #ise2023 lecture, we were discussing #KnowledgeGraph Completion. Most simple approach for unsupervised #linkprediction based on (here translation-based) knowledge graph embeddings was explained on the example of Isaac Asimov.
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
@fizise @enorouzi #scifi #knowledgegraphs #ai #deeplearning #embeddings

Slide from the last Information Service Engineering 2023 lecture, ISE Applications, 5.2 Knowledge Graph Completion:
Link Prediction with KG Embeddings
- Use Translational Embeddings
-- Unsupervised methods, e.g. TransE, use zs + zp to predict zo
-- Supervised Methods for prediction, based on embedding vectors

Vectors for "Isaac Asimov" and "occupation" are added. For the resulting vector a nearest neighbor search is conducted to find - besides others - "SciFi Writer".

Ok, I tried out Runway Gen-2. I did some tests with prompt only but also with uploaded images (mostly generated by another generative AI). Lessons learned: 1) don't expect too much...
2) you have to try very often...
3) don't expect too much ;-)
Below you can see the video generated based on my stablediffusion "Singularity" picture from the #ise2023 lecture. #generativeAI #runway #stablediffusion #stablediffusionart #aiart #singularity

How can we find out the importance of a node in a #knowledgeGraph? In the last #ise2023 lecture, we were discussing graph centrality measures and how they can be applied in the context of knowledge graphs.
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
SPARQL query (cf image below, the 100 most "important" #SciFi authors according to #wikidata) https://w.wiki/78Un
@fizise @enorouzi #semanticweb #lecture #ai #datascience #analytics

SPARQL Query from the Information Service Engineering 2023 Lecture, no. 13, ISE APplications: What are the most important Science Fiction authors according to Wikidata? Please find the SPARQL query in the slides or in the link above.

Topics of the last #ise2023 lecture; The Graph in #KnowledgeGraphs, Knowledge Graph Completion, A Brief History of Large Language Models, and Knowledge Graphs and Large Language Models. I will highlight some topics with the upcoming toots...
Slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#llms #languagemodels #deeplearning #linkprediction #kgc #lecture #machinelearning #transformers #gpt @fizise @enorouzi

Cover Slide of the last Information Service Engineering 2023 lecture, ISE Applications 01. Picture created by ArtBot. Prompt: "The seeds of modern Artificial Intelligence were planted by philosophers who attempted to describe the process of human thinking as the mechanical manipulation of symbols. Deep learning is a class ….”, created via ArtBot, Deliberate, 2023, [CC-BY-4.0]

Last #ise2023 lecture of this semester is about to start. 8:00AM is always tough for the students as well as for the professor 🥳 @fizise @KIT_Karlsruhe @enorouzi #ai #machinelearning #KnowledgeGraphs