Lmst

Kann KI auch ohne Urheberrechtsverletzung stark sein? EleutherAI zeigt mit „Common Pile v0.1“, wie ethisches Training mit 8 TB aus freien & lizenzierten Quellen aussehen kann. Reicht das gegen die Großen der Branche? Klick rein & urteile selbst. #EleutherAI #CommonPile #KI 👇
https://www.all-ai.de/news/news24/ki-training-free

TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text. “The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, […]

https://rbfirehose.com/2025/06/07/techcrunch-eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text/

Une autres alternative de LLM open source à découvrir :

GPT-NeoX / GPT-Neo-J : Développés par des chercheurs d'EleutherAI, un laboratoire de recherche en IA à but non lucratif

#EleutherAI #GPT-NeoX
#openLLM #LLMalternatif

EleutherAI is a grassroots non-profit AI research group, formed in July 2020 by Connor Leahy, Sid Black, and Leo Gao. Known for creating open-source models like GPT-Neo, GPT-J, and GPT-NeoX, their Pile dataset is widely used for training large language models. In early 2023, they incorporated as the EleutherAI Institute. #AI #OpenSource #EleutherAI #MachineLearning #GPT
https://eleuther.ai

"Recently it was revealed that an AI research lab called #EleutherAI had harvested subtitles from YouTube videos without the creators' consent. This data was then combined with data from Wikipedia, the U.K. Parliament and Enron Staff emails and added to a dataset called “the Pile.”
(Tom's Guide 7/22/2024)

YouTube creators surprised to find Apple and others trained AI on their videos

AI models at Apple, Salesforce, Anthropic, and other major technology players were trained on tens of thousands of YouTube videos without the creators' consent and potentially in violation of YouTube's terms

#Apple #Salesforce #Anthropic #EleutherAI #YouTube #data #BigData #TrainingData #ArtificialIntelligence #AI #technology #tech

https://arstechnica.com/ai/2024/07/apple-was-among-the-companies-that-trained-its-ai-on-youtube-videos/

This is a sentence and a half: “Proof News' article also mentions that it was trained on videos of a parrot, so AI models are parroting a parrot, parroting human speech, as well as parroting other AIs, parroting humans.” #EleutherAI

https://arstechnica.com/ai/2024/07/apple-was-among-the-companies-that-trained-its-ai-on-youtube-videos/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

@arstechnica@mastodon.social @arstechnica@techhub.social

AI companies used YouTube videos without permission to train models

https://stackdiary.com/ai-companies-used-youtube-videos-without-permission-to-train-models/

#AI #Jobs #Technology #Innovation #Future #Automation #Creativity #Economy #FairUse #Regulation #JobLoss #TechIndustry #Data #Ethics #Disruption #AIImpact #Workforce #Change #Equality #Productivity #UBI #JobMarket #AIResearch #EconomicValue #AIDevelopment #Society #YouTube #ContentCreation #AITraining #Apple #Nvidia #Anthropic #Salesforce #EleutherAI #BigTech #LLM #Privacy

Did you remember #EleutherAi exists?
I honestly completely forgot about them.

#ArtificialIntelligence #AI

How the Foundation Model Transparency Index Distorts Transparency | EleutherAI Blog https://blog.eleuther.ai/fmti-critique/

I saw the Foundation Model Transparency Index paper come out recently and was surprised that OpenAI scored as high as they did. This Eleuther AI post breaks down how the Foundation Model Transparency index gets it all wrong, and is not really measuring transparency at all.

#fmti
#foundationmodeltransparencyindex
#opensource
#LLM
#eleutherai

Gizmodo: Anti-Piracy Group Takes Massive AI Training Dataset 'Books3′ Offline https://gizmodo.com/anti-piracy-group-takes-ai-training-dataset-books3-off-1850743763 #generativepretrainedtransformer #artificialneuralnetworks #artificialintelligence #largelanguagemodels #technologyinternet #mariafredenslund #sarahsilverman #shawnpresser #deeplearning #eleutherai #microsoft #chatgpts #thepile #chatgpt #openai #llama #gpt3 #gpt4 #meta

Stability AI hat heute sein erstes japanisches Language Model (LM) vorgestellt: das Japanese StableLM Alpha.

#KI #AI #JapaneseStableLM #StabilityAI #LanguageModel #KI #Japanisch #Forschung #Technologie #GPTNeoX #EleutherAI #HuggingFaceHub

https://kinews24.de/stability-ai-japanese-stablelm-ein-neuzugang-im-internationalen-language-model-markt

Neu: »Algorithmische Affären und Binärcodebekenntnisse oder Wie schaffen wir gemeinsam Text?« von #ClaraCosimaWolff mit #EleutherAI und #GPT3 und einer Umschlagzeichnung von #LukasGütnher (#AufklärungundKritik 530)

https://sukultur.de/produkt/clara-cosima-wolff-algorithmische-affaeren-und-binaercodebekenntnisse-oder-wie-schaffen-wir-gemeinsam-text-auk-530/

The article points out we'll see lawsuits against artificial intelligence models for years to come. Good for her & them; these companies should start over w/o copywritten sources.
#AI #artificial #intelligence #copyright #law is a #thing | #dataset #large #language #models #illegal #sources #Bibliotik #fragrantly #illegal #programmers #artists #suing #similar #case #EleutherAI #ThePile #publishers #writers #songwriters #stolen #works #Meta #llama #ChatGPT #library #OpenAI https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

🚀 New episode of The Changelog!

This week we’re taking you to the hallway track of The #Linux Foundation’s #OSSummit North America 2023 in Vancouver, Canada 🇨🇦

This episode features three conversations about #opensource #AI:

1️⃣ Beyang Liu (Co-founder and CTO at #Sourcegraph)
2️⃣ @dennyglee (Developer Advocate at #Databricks)
3️⃣ Stella Biderman (Head of Research at #EleutherAI)

🎧 https://changelog.fm/541

Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

"The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

https://www.together.xyz/blog/redpajama-models-v1

💻 We are ready to train state-of-the-art open source models from our partner community #EleutherAI with massive compute resources made available through @Stabilityai. 4/5

There are some really good papers that have sought to make the best of the current situation, but #EleutherAI had the compute to do it the right way and so we did.

https://arxiv.org/abs/2211.08411
https://arxiv.org/abs/2202.07646
https://arxiv.org/abs/2202.07206
https://arxiv.org/abs/2207.14251

We hope that this work will empower more people to work on questions in interpretability, especially the causal impact of training data on model behavior!

What do LLMs learn over the course of training? How do these patterns change as you scale? To help answer these questions, we are releasing a Pythia, suite of LLMs + checkpoints designed for research on interpretability and training dynamics!

The models have sizes ranging from 19M to 13B parameters, contain 143 intermediate checkpoints, and were trained on the same exact data in the same exact order.

#ml #ai #nlproc #interpretability #EleutherAI

https://github.com/EleutherAI/pythia

#eleutherai

Client Info