#eleutherai

CarambaCaramba1
2025-06-07

Kann KI auch ohne Urheberrechtsverletzung stark sein? EleutherAI zeigt mit „Common Pile v0.1“, wie ethisches Training mit 8 TB aus freien & lizenzierten Quellen aussehen kann. Reicht das gegen die Großen der Branche? Klick rein & urteile selbst. 👇
all-ai.de/news/news24/ki-train

2025-06-07

TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text. “The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, […]

https://rbfirehose.com/2025/06/07/techcrunch-eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text/

alberticalbertic
2025-01-16

Une autres alternative de LLM open source à découvrir :

GPT-NeoX / GPT-Neo-J : Développés par des chercheurs d'EleutherAI, un laboratoire de recherche en IA à but non lucratif

-NeoX

Harald KlinkeHxxxKxxx@det.social
2024-08-05

EleutherAI is a grassroots non-profit AI research group, formed in July 2020 by Connor Leahy, Sid Black, and Leo Gao. Known for creating open-source models like GPT-Neo, GPT-J, and GPT-NeoX, their Pile dataset is widely used for training large language models. In early 2023, they incorporated as the EleutherAI Institute. #AI #OpenSource #EleutherAI #MachineLearning #GPT
eleuther.ai

MistercazEEK94MisterEK
2024-07-31

"Recently it was revealed that an AI research lab called had harvested subtitles from YouTube videos without the creators' consent. This data was then combined with data from Wikipedia, the U.K. Parliament and Enron Staff emails and added to a dataset called “the Pile.”
(Tom's Guide 7/22/2024)

gtbarrygtbarry
2024-07-19

YouTube creators surprised to find Apple and others trained AI on their videos

AI models at Apple, Salesforce, Anthropic, and other major technology players were trained on tens of thousands of YouTube videos without the creators' consent and potentially in violation of YouTube's terms

arstechnica.com/ai/2024/07/app

Felicitas Macgilchristdiscoursology@social.coop
2024-07-16

This is a sentence and a half: “Proof News' article also mentions that it was trained on videos of a parrot, so AI models are parroting a parrot, parroting human speech, as well as parroting other AIs, parroting humans.” #EleutherAI

arstechnica.com/ai/2024/07/app

@arstechnica@mastodon.social @arstechnica@techhub.social

mikwee 🎗️mikwee@calckey.world
2023-12-30

Did you remember #EleutherAi exists?
I honestly completely forgot about them.

#ArtificialIntelligence #AI

2023-10-28

How the Foundation Model Transparency Index Distorts Transparency | EleutherAI Blog blog.eleuther.ai/fmti-critique

I saw the Foundation Model Transparency Index paper come out recently and was surprised that OpenAI scored as high as they did. This Eleuther AI post breaks down how the Foundation Model Transparency index gets it all wrong, and is not really measuring transparency at all.

#fmti
#foundationmodeltransparencyindex
#opensource
#LLM
#eleutherai

2023-08-18
2023-08-08

Neu: »Algorithmische Affären und Binärcodebekenntnisse oder Wie schaffen wir gemeinsam Text?« von #ClaraCosimaWolff mit #EleutherAI und #GPT3 und einer Umschlagzeichnung von #LukasGütnher (#AufklärungundKritik 530)

sukultur.de/produkt/clara-cosi

2023-05-25

🚀 New episode of The Changelog!

This week we’re taking you to the hallway track of The #Linux Foundation’s #OSSummit North America 2023 in Vancouver, Canada 🇨🇦

This episode features three conversations about #opensource #AI:

1️⃣ Beyang Liu (Co-founder and CTO at #Sourcegraph)
2️⃣ @dennyglee (Developer Advocate at #Databricks)
3️⃣ Stella Biderman (Head of Research at #EleutherAI)

🎧 changelog.fm/541

Tero Keski-Valkamatero@rukii.net
2023-05-06

Releasing 3B and 7B #RedPajama-#INCITE family of models including base, instruction-tuned & chat models — #TOGETHER

"The biggest takeaway is the demonstration that performant #LLMs can be built quickly by the open-source community. This work builds on top of our 1.2 trillion token RedPajama dataset, EleutherAI’s #Pythia training code, #FlashAttention from #Stanford and #Together, the #HELM benchmarks from Stanford #CRFM and generous support from #MILA, #EleutherAI & #LAION for compute time on the #Summit #supercomputer within the INCITE program award 'Scalable Foundation Models for Transferable Generalist AI'. We believe these kind of open collaborations, at larger scales, will be behind the best #AI systems of the future. "

together.xyz/blog/redpajama-mo

2023-02-15

💻 We are ready to train state-of-the-art open source models from our partner community #EleutherAI with massive compute resources made available through @Stabilityai. 4/5

2023-01-01

There are some really good papers that have sought to make the best of the current situation, but #EleutherAI had the compute to do it the right way and so we did.

arxiv.org/abs/2211.08411
arxiv.org/abs/2202.07646
arxiv.org/abs/2202.07206
arxiv.org/abs/2207.14251

We hope that this work will empower more people to work on questions in interpretability, especially the causal impact of training data on model behavior!

2023-01-01

What do LLMs learn over the course of training? How do these patterns change as you scale? To help answer these questions, we are releasing a Pythia, suite of LLMs + checkpoints designed for research on interpretability and training dynamics!

The models have sizes ranging from 19M to 13B parameters, contain 143 intermediate checkpoints, and were trained on the same exact data in the same exact order.

#ml #ai #nlproc #interpretability #EleutherAI

github.com/EleutherAI/pythia

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst