#AI #GenerativeAI #Journalism #News #LLMs #RAGs #Science #ScienceJournalism #Media: "Surprisingly, GPT-4 with the abstracts performs a little bit better than GPT-4 with RAG over the article text, both in terms of accuracy (96.6% vs. 93.5%) and win percentage (29.2% vs. 27.8% — the rest were ties). This suggests that more context from the scientific article did not necessarily lead to higher accuracy or better understandability (i.e. our first assumption about using RAG here as described above was not met). A deeper investigation of the retrieved snippets and their actual relevance to the jargon term may help understand if this is an issue with the quality of the context, or if there may be other causes such as the similarity threshold we used. To apply RAG successfully, it’s essential to explore and test different parameters. Without this kind of careful experimentation, RAG on its own might not provide the desired results.
It may also be the case that the large size of GPT-4’s pre-training dataset enables it to draw from other sources to generate definitions. This can be as much a concern as a benefit though — it can make it harder to override irrelevant information from the pre-training data, or eschew its limitations based on cutoff dates for model training.
We also found that the effectiveness of these approaches varied based on the reader’s expertise. For instance, the less experienced annotator found similar value in both methods (higher tie percentage), while the more expert reader noticed more differences. Further evaluation with a larger set of annotators may help to replicate and understand these differences."
https://generative-ai-newsroom.com/making-sense-of-science-llms-for-helping-reporters-understand-complex-research-f1292720a09c