If I had the time, energy, and education to pull it off, I'd do some scholarship and writing elaborating on this juxtaposition:
- Statistics, as a field of study, gained significant energy and support from eugenicists with the purpose of "scientizing" their prejudices. Some of the major early thinkers in modern statistics, like Galton, Pearson, and Fisher, were eugenicists out loud; see
https://nautil.us/how-eugenics-shaped-statistics-238014/- Large language models and diffusion models rely on certain kinds of statistical methods, but discard any notion of confidence interval or validation that's grounded in reality. For instance, the LLM inside GPT outputs a probability distribution over the tokens (words) that could follow the input prompt. However, there is no way to even make sense of a probability distribution like this in real-world terms, let alone measure anything about how well it matches reality. See for instance
https://aclanthology.org/2020.acl-main.463.pdf and Michael Reddy's
The conduit metaphor: A case of frame conflict in our language about languageEarly on in this latest AI hype cycle I wrote a note to myself that this style of AI is
necessarily biased. In other words, the bias coming out isn't primarily a function of biased input data (though of course that's a problem too). That'd be a kind of contingent bias that could be addressed. Rather, the bias these systems exhibit is a function of how the things are structured at their core, and no amount of data curating can overcome it. I can't prove this, so let's call it a hypothesis, but I believe it.
#AI #GenAI #GenerativeAI #ChatGPT #GPT #Gemini #Claude #Llama #StableDiffusion #Midjourney #DallE #LLM #DiffusionModel #linguistics #NLP