#TrainingData

☮ ♥ ♬ 🧑‍💻peterrenshaw@ioc.exchange
2025-07-05

“Suno, for those of you not familiar, is an #AI #SongGenerator: enter a text prompt (such as “a jazz, reggae, EDM pop song about my imagination”) and a song comes back. Like many #GenerativeAI companies, it is also being sued by all and sundry for ingesting #copyrighted #material. The parties in the suit — including major labels and the #RIAA — don’t have a smoking gun, since they can’t directly peek at Suno’s #TrainingData. But they have managed to generate some suspiciously similar-sounding AI generated materials, #mimicking (among others) “Johnny B. Goode,” “Great Balls of Fire,” and Jason Derulo’s habit of singing his own name.

#Suno essentially admits these songs were #regurgitated from #copyrighted source material, but it says such use was legal. “It is no secret that the tens of millions of #recordings that Suno’s model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case,” it says in its own legal filing. Whether AI training data constitutes fair use is a common but unsettled legal argument, and the plaintiffs contend Suno still amounts to “pervasive #illegal #copying” of artists’ works.”

#NYA / #music / #ElizabethLopatto / #amazon / #DataTheft <neilyoungarchives.com/news/3/a>

Kevin Dominik Kortekdkorte@fosstodon.org
2025-06-29

It appears that we are not providing sufficient training data for Meta's AI tools. I mean, you can now upload your private stories to get AI suggestions for your next Instagram story.
#AI #trainingdata
thehackernews.com/2025/06/face

eicker.news ᳇ tech newstechnews@eicker.news
2025-06-20

#MIT researchers developed a method called #SelfAdapting #LanguageModels (#SEAL) that enables large language models to continuously #learn and #improve by generating #synthetic #trainingdata and updating their parameters based on new information. wired.com/story/this-ai-model- #tech #media #news

☮ ♥ ♬ 🧑‍💻peterrenshaw@ioc.exchange
2025-06-19

“Broad didn’t train his #AI on #Rothko; he didn’t train it on any #data at all. By hacking a #NeuralNetwork, and locking elements of it into a #recursive #loop, he was able to induce this AI into producing #images without any #TrainingData at all — no inputs, no influences.

Depending on your perspective, Broad’s art is either a pioneering display of pure artificial creativity, a look into the very soul of AI, or a clever but meaningless electronic by-product, closer to guitar feedback than music.

In any case, his work points the way toward a more creative and ethical use of #GenerativeAI beyond the large-scale manufacture of #DerivativeSlop now oozing through our visual culture.”

#Art / #TerenceBroad / #UnstableEquilibrium <theverge.com/ai-artificial-int>

N-gated Hacker Newsngate
2025-05-25

🐢 Oh, look! A thrilling 120-page about Claude 4 System Cards, because in the world, longer is clearly better... right? 📜 Filled with steaming details for those who miss "Person of Interest" fan fiction and enjoy deciphering cryptic training data. 🎭 Meanwhile, landing pages remain a mythical creature in Anthropics' universe. 🦄
simonwillison.net/2025/May/25/

2025-05-20

Facial recognition algorithms developed in East Asia performed better on Asian subjects, while Western algorithms performed better on White subjects. This discrepancy is attributed to different racial distribution in training sets. #TrainingData #AIAccuracy

Erik JonkerErikJonker
2025-04-25

Just wondering how you collect as much Mastodon content as possible for AI training purposes ?

2025-04-20

Open Web Crawl is such a security vulnerability, that I don’t know why it isn’t the top of the news every day.

If you turn on a general suction hose, how do you not realise there’s going to be a party of attackers right there feeding it all the #propaganda they possibly can?

How can you be so nonchalant about it? How do you not realise you created the biggest attack vector in the history of computing?

#ai #trainingdata #crawlers

2025-03-26

🚀 The EU’s #AI Challenge: Can Europe Compete Without Enough #TrainingData?

Daniel Friedlaender explains why AI #innovation depends on #data diversity – and how outdated #privacy approaches are a disadvantage in the global AI race.

#AIAct #DataProtection #EuropeanAIroundtable

youtube.com/watch?v=hEAzCWVjdu

Kevin Dominik Kortekdkorte@fosstodon.org
2025-03-13

The recent copyright decisions against AI have made it imperative to rethink how we train AI. Ultimately, we should aim to build a free training data repository to train genuinely free AI.
#AI #trainingdata
korte.co/3iqx

olеg lаvrоvskyloleg@hachyderm.io
2025-03-01

Many critics maintain that AI cannot be open sourced in principle (cited: Lessing, Casado, Stoica). To me it seems clear that all people investing into public uses for AI have a duty to demand legal clarity and open access to #trainingdata // / @simonschlauri delivers a constructive and balanced exposé at #Winterkongress winterkongress.ch/2025/talks/o

What about training data? The alternative position: training data must be open. 
The reasons:
- In Software, Open Source means the right to see the code.
- The same must also apply for training data of AI models, because only with the right to see inside can:
> Hallucinations or corruptions be understood
> Illegal sources found
> Security research be done
> Copyright protected
- Fine tuning does not bring the same results as complete access to training data.

(My own quick translation from German of the slide content)
Coach Pāṇini ®paninid@mastodon.world
2025-02-20

@RuthMalan

The worst-case scenario here is you get sued?

Apparently, this author/text was not included in the Book3 dataset of pirated content used for LLM #trainingdata.

PPC Landppcland
2025-02-12

Copyright debate intensifies over AI training data use: Analysis of Andreessen Horowitz's position on AI model training and fair use in response to US Copyright Office inquiry ppc.land/copyright-debate-inte

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst