#TrainingAI

2025-06-26

Business Insider: Scale AI exposed sensitive data about clients like Meta and xAI in public Google Docs, BI finds. “As Scale AI seeks to reassure customers that their data is secure following Meta’s $14.3 billion investment, leaked files and the startup’s own contractors indicate it has some serious security holes. Scale AI routinely uses public Google Docs to track work for high-profile […]

https://rbfirehose.com/2025/06/26/business-insider-scale-ai-exposed-sensitive-data-about-clients-like-meta-and-xai-in-public-google-docs-bi-finds/

2025-06-18

TechCrunch: Mastodon updates its terms to prohibit AI model training. “Social networks are bolstering their terms of service against scrapers and bots that crawl the website to train AI models. Days after Elon Musk-owned X updated its terms to explicitly prohibit AI model training, decentralized social network Mastodon today updated its own rules to bar any kind of model training, as well.”

https://rbfirehose.com/2025/06/18/techcrunch-mastodon-updates-its-terms-to-prohibit-ai-model-training/

2025-06-17

Futurism: Lawyers Just Discovered Something About Meta’s AI That Could Cost Zuckerberg Untold Billions of Dollars. “A legal expert found that Meta’s AI is able to spit out entire portions of books verbatim — and if he’s right, it could be seriously bad news for the company and its CEO Mark Zuckerberg.”

https://rbfirehose.com/2025/06/17/futurism-lawyers-just-discovered-something-about-metas-ai-that-could-cost-zuckerberg-untold-billions-of-dollars/

2025-06-17

404 Media: AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums. “AI bots that scrape the internet for training data are hammering the servers of libraries, archives, museums, and galleries, and are in some cases knocking their collections offline, according to a new survey published today.” As you might imagine this drives me absolutely WILD.

https://rbfirehose.com/2025/06/17/404-media-ai-scraping-bots-are-breaking-open-libraries-archives-and-museums/

2025-06-13

Harvard Library: Institutional Books 1.0: A 242B Token Dataset from Harvard Library’s Collections, Refined for Accuracy and Usability. “The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the stewardship of these datasets in sustainable practices with clear […]

https://rbfirehose.com/2025/06/13/institutional-books-1-0-a-242b-token-dataset-from-harvard-librarys-collections-refined-for-accuracy-and-usability-harvard-library/

2025-06-08

TechCrunch: X changes its terms to bar training of AI models using its content. “Social network X has changed its developer agreement to prevent third parties from using the platform’s content to train large language models.” Considering the content of that platform nowadays I’m not sure this is a bad thing…

https://rbfirehose.com/2025/06/08/techcrunch-x-changes-its-terms-to-bar-training-of-ai-models-using-its-content/

2025-06-07

The Register: Reddit sues Anthropic for scraping content into the maw of its eternally ravenous AI. “Reddit, the popular internet discussion forum, sued Anthropic on Wednesday, alleging that the AI biz scraped content generated by its users in violation of contractual terms and technical barriers. The complaint [PDF], filed in San Francisco Superior Court on Wednesday, claims Anthropic’s use […]

https://rbfirehose.com/2025/06/07/the-register-reddit-sues-anthropic-for-scraping-content-into-the-maw-of-its-eternally-ravenous-ai/

2025-06-02

AFP: New York Times signs AI licensing deal with Amazon. “The New York Times has agreed a deal for Amazon to use its content to train artificial intelligence models, the leading U.S. newspaper announced Thursday in its first generative AI licensing deal. Several media groups have already struck similar deals with major tech companies, but The New York Times had previously refused to allow its […]

https://rbfirehose.com/2025/06/02/afp-new-york-times-signs-ai-licensing-deal-with-amazon/

2025-05-28

AFP: German court says Meta can use user data to train AI. “A German court on Friday dismissed an injunction request brought by consumer protection groups to prevent US tech giant Meta from using user data from Facebook and Instagram to train artificial intelligence systems.”

https://rbfirehose.com/2025/05/28/afp-german-court-says-meta-can-use-user-data-to-train-ai/

2025-05-27

San Jose Spotlight: Silicon Valley cities hit with request for residents’ emails to train AI. “Mountain View-based company GovernmentGPT filed 90 California Public Records Act requests with multiple cities across the Bay Area for emails from residents addressed to mayors, councilmembers and city clerks from 2020 to 2023. The goal is to create an artificial intelligence tool that can easily […]

https://rbfirehose.com/2025/05/27/san-jose-spotlight-silicon-valley-cities-hit-with-request-for-residents-emails-to-train-ai/

2025-05-21

The New Stack: Data Commons Can Save Open AI. “Two paradigm shifts are needed. First, AI developers can no longer afford to build datasets alone, treating vast bodies of knowledge, culture and information as a raw resource they can turn into tokens. Datasets must be viewed as tools for solving AI development challenges and addressing other stakeholders’ needs. This entails collaboration, […]

https://rbfirehose.com/2025/05/21/the-new-stack-data-commons-can-save-open-ai/

2025-05-19

Mashable: In copyright fight, artists use white-hot AI report as weapon against Meta. “Plaintiffs in the landmark Kadrey v. Meta case have already submitted the U.S. Copyright Office’s controversial AI report as evidence in their copyright infringement suit against the tech giant.”

https://rbfirehose.com/2025/05/19/mashable-in-copyright-fight-artists-use-white-hot-ai-report-as-weapon-against-meta/

2025-05-17

Engadget: SoundCloud backtracks on ‘too broad’ AI terms of service. “Specifically, SoundCloud’s Terms of Use now forbids the company from using content uploaded to SoundCloud to train generative AI that replicates an artist without their consent.”

https://rbfirehose.com/2025/05/17/engadget-soundcloud-backtracks-on-too-broad-ai-terms-of-service/

2025-05-14

TechCrunch: SoundCloud changes policies to allow AI training on user content. “SoundCloud appears to have quietly changed its terms of use to allow the company to train AI on audio that users upload to its platform. As spotted by tech ethicist Ed Newton-Rex, the latest version of SoundCloud’s terms include a provision giving the platform permission to use uploaded content to ‘inform, train, […]

https://rbfirehose.com/2025/05/14/techcrunch-soundcloud-changes-policies-to-allow-ai-training-on-user-content/

2025-05-08

Tubefilter: RHEI says creators are making big money by selling their content to AI companies. “This past January, RHEI–the Vancouver-based tech company formerly known as BBTV—launched RHEI Data Pro, a data monetization platform that lets creators and media companies choose to license their content catalogs to companies building LLMs. At the time, RHEI said its platform could help creators […]

https://rbfirehose.com/2025/05/08/tubefilter-rhei-says-creators-are-making-big-money-by-selling-their-content-to-ai-companies/

2025-05-04

Bloomberg: Google Can Train Search AI With Web Content After AI Opt-Out. “Google can train its search-specific AI products, like AI Overviews, on content across the web even when the publishers have chosen to opt out of training Google’s AI products, a vice-president of product at the company testified in court on Friday.”

https://rbfirehose.com/2025/05/04/bloomberg-google-can-train-search-ai-with-web-content-after-ai-opt-out/

2025-04-22

MIT News: Training LLMs to self-detoxify their language. “Over time, most of us develop an internal ‘guide’ that enables us to learn context behind conversation; it also frequently directs us away from sharing information and sentiments that are, or could be, harmful or inappropriate. As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and […]

https://rbfirehose.com/2025/04/22/mit-news-training-llms-to-self-detoxify-their-language/

2025-04-21

Digital Trends: Meta is training AI on your data. Users say opting out doesn’t work.. “Imagine a tech giant telling you that it wants your Instagram and Facebook posts to train its AI models. And that too, without any incentive. You could, however, opt out of it, as per the company. But as you proceed with the official tools to back out and prevent AI from gobbling your social content, they […]

https://rbfirehose.com/2025/04/21/digital-trends-meta-is-training-ai-on-your-data-users-say-opting-out-doesnt-work/

2025-04-19

PressGazette: ‘Unsustainable status quo’: AI companies and publishers respond to Govt copyright consultation. “The UK Government’s proposal to allow AI companies to automatically train their models on online content unless the rightsholder specifically opts out has been described as ‘unworkable’. A range of responses to the Government consultation on its proposed change to the existing […]

https://rbfirehose.com/2025/04/19/unsustainable-status-quo-ai-companies-and-publishers-respond-to-govt-copyright-consultation-pressgazette/

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst