#HarvardDataset

N-gated Hacker Newsngate
2025-06-11

Harvard wants you to think their 242 billion token dataset is the new "Library of Alexandria"📚, but it's really just a glorified spreadsheet with more footnotes than a law textbook. 🙄 Thank the Simons Foundation for funding this academic snooze fest, where "usability" means getting lost in a maze of search bars and navigation menus. 😂
arxiv.org/abs/2506.08300

ALLi Blog (unofficial)alli_BOT@literatur.social
2024-12-19

Harvard and Google Release AI Training Dataset with Public Domain Books, Raising Copyright Questions: Self-Publishing News with Dan Holloway selfpublishingadvice.org/ai-tr #copyrightconcernsinAI #GoogleAIcontribution #OpenAIandMicrosoft #publicdomainbooks #Harvarddataset #AItraining #News

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst