You CANNOT REVERSE ENGINEER Google's processes from emails disclosed in court. You will learn bits and pieces. RESIST THE TEMPTATION to create your own picture with those bits and pieces. Take this email as an example. The highlighted section says: "Those signals will be very helpful for us to upweighting good, authoritative pages and downweighting the spammy, untrustworthy ones." But read the rest of the message.
It would be challenging to train a Large Language Model on quality scores. They're not words and phrases. They're numbers.
The pretraining process could be used to filter out documents they don't want to use for training, or to ensure documents they want to use ARE included. It could also be used to assign aggregated scores to documents that are chosen for training (maybe the weighting could be used to adjust the weighted averages used to compute the relationships between words and phrases across the body of training documents).
The second paragraph makes it clear they didn't want to directly integrate these signals into the training data.
#ai #google #gemini #searchengines #pagerank #seo #searchengineoptimization #webmarketing #digitalmarketing #machinelearning #llms