In a recent post, I perpetrated the fallacy that the notion of a #flatEarth was endemic during the #MiddleAges. Someone correctly pointed out that this was not 100% factually accurate, and suggested that no one actually thought this during the Middle Ages. While "no one" may also be a misrepresentation of how widespread the knowledge of a spherical Earth was at the time, let me explain why my factually flawed lead-in about a putatively widespread belief actually reinforces the original post's central point about the inherent bias caused by large-scale #dataAggregation when training #AI.
Thanks to the Greeks, scholars and the well-educated knew about a spherical Earth since about 500 BCE, and that even during the Middle Ages the educated elite (who were nevertheless a minority) widely accepted it as fact. What the typically uneducated general public thought about it at the time may be a different story, though. Regardless, this inaccurate assumption about the beliefs of the time actually reinforces my original point about how certain factual inaccuracies and data biases, especially when amplified by repetition, negatively impact the usefulness of the current generation of #LLM and #GenAI systems.
In some ways, geocentrism may have been a better example. However, whichever example you choose, in this context its veracity is less important than how often the statement is made, impacting the frequency or weighting of the statement within the corpus used to train an #AI or #ML system.
The references to historical beliefs in a flat Earth or the solar system revolving around the Earth are widely repeated, and that's all that's necessary for it to become a data point within the statistical mean of a large and uncurated #ML #dataset. In other words, if enough separate data sources repeat a given statement frequently enough, that's often sufficient to skew the resulting data set. This problem is closely related to the very human cognitive bias that people have for believing commonly heard statements.
To support the historical points about who may or may not have believed in a flat Earth or geocentrism during the Middle Ages, I've attached some relevant links. Meanwhile, I'll post later about why any "belief in a belief" or frequently-repeated datum actually defines an existential problem with many of today's very large AI/ML systems, and what we can collectively do about it.