Lmst

“Suno, for those of you not familiar, is an #AI #SongGenerator: enter a text prompt (such as “a jazz, reggae, EDM pop song about my imagination”) and a song comes back. Like many #GenerativeAI companies, it is also being sued by all and sundry for ingesting #copyrighted #material. The parties in the suit — including major labels and the #RIAA — don’t have a smoking gun, since they can’t directly peek at Suno’s #TrainingData. But they have managed to generate some suspiciously similar-sounding AI generated materials, #mimicking (among others) “Johnny B. Goode,” “Great Balls of Fire,” and Jason Derulo’s habit of singing his own name.

#Suno essentially admits these songs were #regurgitated from #copyrighted source material, but it says such use was legal. “It is no secret that the tens of millions of #recordings that Suno’s model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case,” it says in its own legal filing. Whether AI training data constitutes fair use is a common but unsettled legal argument, and the plaintiffs contend Suno still amounts to “pervasive #illegal #copying” of artists’ works.”

#NYA / #music / #ElizabethLopatto / #amazon / #DataTheft <https://neilyoungarchives.com/news/3/article?id=Music%20-%20Amazon%20is%20blundering%20into%20an%20AI%20copyright%20nightmare>

Ethical AI Image Generator Bria Releases Next-Gen Model That Didn’t Steal Your Data https://petapixel.com/2025/07/01/ethical-ai-image-generator-bria-releases-next-gen-model-that-didnt-steal-your-data/ #aiimagegenerator #trainingdata #Technology #ethicalai #News #bria

It appears that we are not providing sufficient training data for Meta's AI tools. I mean, you can now upload your private stories to get AI suggestions for your next Instagram story.
#AI #trainingdata
https://thehackernews.com/2025/06/facebooks-new-ai-tool-requests-photo.html

Getty Images Drops Main Copyright Claims Against Stability AI in UK Legal Case https://petapixel.com/2025/06/26/getty-images-drops-main-copyright-claims-against-stability-ai-in-uk-legal-case/ #aiimagegenerator #gettyvsstability #machinelearning #stablediffusion #trainingdata #gettyimages #stabilityai #Technology #aitraining #copyright #aiimage #News #Law

Anthropic destroyed millions of print books to build its AI models https://arstechni.ca/YLWE #InternetArchive #machinelearning #AIdevelopment #bookscanning #legalrulings #trainingdata #AIcompanies #googlebooks #AIresearch #AItraining #Anthropic #copyright #AIethics #scanning #fairuse #Biz&IT #Policy #Claude #AIlaw #AI

#MIT researchers developed a method called #SelfAdapting #LanguageModels (#SEAL) that enables large language models to continuously #learn and #improve by generating #synthetic #trainingdata and updating their parameters based on new information. https://www.wired.com/story/this-ai-model-never-stops-learning/?eicker.news #tech #media #news

“Broad didn’t train his #AI on #Rothko; he didn’t train it on any #data at all. By hacking a #NeuralNetwork, and locking elements of it into a #recursive #loop, he was able to induce this AI into producing #images without any #TrainingData at all — no inputs, no influences.

Depending on your perspective, Broad’s art is either a pioneering display of pure artificial creativity, a look into the very soul of AI, or a clever but meaningless electronic by-product, closer to guitar feedback than music.

In any case, his work points the way toward a more creative and ethical use of #GenerativeAI beyond the large-scale manufacture of #DerivativeSlop now oozing through our visual culture.”

#Art / #TerenceBroad / #UnstableEquilibrium <https://www.theverge.com/ai-artificial-intelligence/688576/feed-ai-nothing>

Disney and Universal Studios Sue AI Image Generator Midjourney Over Copyright https://petapixel.com/2025/06/11/disney-and-universal-studios-sue-ai-image-generator-midjourney-over-copyright/ #aiimagegenerator #universalstudios #generativeai #trainingdata #aicopyright #midjourney #disney #News #Law

🐢 Oh, look! A thrilling 120-page #novella about Claude 4 System Cards, because in the #AI world, longer is clearly better... right? 📜 Filled with steaming details for those who miss "Person of Interest" fan fiction and enjoy deciphering cryptic training data. 🎭 Meanwhile, landing pages remain a mythical creature in Anthropics' universe. 🦄
https://simonwillison.net/2025/May/25/claude-4-system-card/ #Claude4 #FanFiction #TrainingData #MythicalCreatures #HackerNews #ngated

Facial recognition algorithms developed in East Asia performed better on Asian subjects, while Western algorithms performed better on White subjects. This discrepancy is attributed to different racial distribution in training sets. #TrainingData #AIAccuracy

Just wondering how you collect as much Mastodon content as possible for AI training purposes ?
#AI #Mastodon #trainingdata

"Spurious correlations":
#ai #llm #machinelearning #trainingdata #scienceandtechnology
🤖

https://techxplore.com/news/2025-04-technique-spurious-problem-ai.html

Open Web Crawl is such a security vulnerability, that I don’t know why it isn’t the top of the news every day.

If you turn on a general suction hose, how do you not realise there’s going to be a party of attackers right there feeding it all the #propaganda they possibly can?

How can you be so nonchalant about it? How do you not realise you created the biggest attack vector in the history of computing?

#ai #trainingdata #crawlers

🚀 The EU’s #AI Challenge: Can Europe Compete Without Enough #TrainingData?

Daniel Friedlaender explains why AI #innovation depends on #data diversity – and how outdated #privacy approaches are a disadvantage in the global AI race.

#AIAct #DataProtection #EuropeanAIroundtable

https://www.youtube.com/watch?v=hEAzCWVjdu4

The recent copyright decisions against AI have made it imperative to rethink how we train AI. Ultimately, we should aim to build a free training data repository to train genuinely free AI.
#AI #trainingdata
https://www.korte.co/3iqx

Many critics maintain that AI cannot be open sourced in principle (cited: Lessing, Casado, Stoica). To me it seems clear that all people investing into public uses for AI have a duty to demand legal clarity and open access to #trainingdata // / @simonschlauri delivers a constructive and balanced exposé at #Winterkongress https://winterkongress.ch/2025/talks/open_source_artificial_intelligence/

What about training data? The alternative position: training data must be open.
The reasons:
- In Software, Open Source means the right to see the code.
- The same must also apply for training data of AI models, because only with the right to see inside can:
> Hallucinations or corruptions be understood
> Illegal sources found
> Security research be done
> Copyright protected
- Fine tuning does not bring the same results as complete access to training data.

(My own quick translation from German of the slide content)

@RuthMalan

The worst-case scenario here is you get sued?

Apparently, this author/text was not included in the Book3 dataset of pirated content used for LLM #trainingdata.

First Legal Ruling on AI, Copyright, and Training Data Goes the Way of Creators https://petapixel.com/2025/02/19/first-legal-ruling-on-ai-copyright-and-training-data-goes-the-way-of-creators/ #artificialintelligence #aitrainingdata #thomsonreuters #generativeai #trainingdata #copyright #fairuse #legal #News #Law

Copyright debate intensifies over AI training data use https://ppc.land/copyright-debate-intensifies-over-ai-training-data-use/ #Copyright #AI #FairUse #DataUse #TrainingData

Copyright debate intensifies over AI training data use: Analysis of Andreessen Horowitz's position on AI model training and fair use in response to US Copyright Office inquiry https://ppc.land/copyright-debate-intensifies-over-ai-training-data-use/ #Copyright #AI #FairUse #DataUse #TrainingData

#TrainingData

Client Info