Lmst

The Ethical Grey Areas of Machine Writing in Higher Education

Once we start to examine how academics actually use conversational agents in real settings, it becomes harder to draw a clear distinction between problematic and unproblematic use. To entirely substitute machine writing for your own, while presenting it under your own name, would strike most as problematic. But this often has little relationship to how machine writing is drawn upon in practice, at least by academics, not least of all because substantive direction is necessary to produce outputs which aren’t generic or vacuous.

Unless you’re willing to explain to the machine what you want it to do, its capacity to meet your needs as an academic will be limited. The specific characteristics of academic work, the extremely specialized forms of output we are expected to produce for equally precise purposes, means that at least some engagement will be necessary in the process. At which point we are faced with confusion of distinguishing between fully accepted resources we draw upon in our intellectual work and those such as machine writing which are seen as potentially contentious. This is a question which Atwell poses very succinctly:

How does collaboration with others and using all the resources we now see as legitimate (the internet, research papers, colleagues work/advice) to do the best work we can, differ from utilising GenAI tools?
https://nationalcentreforai.jiscinvolve.org/wp/2024/08/08/do-you-feel-guilty-about-using-genai/

Through this framing, Atwell draws attention to what machine writing shares with other resources we might draw upon in our intellectual work. There are weaknesses to any resource, which we will ideally always review in light of our understanding of where it is limited. In this sense, machine writing could be seen as simply another resource amongst others. Atwell argues for a “symbiotic relationship where technological efficiency is balanced with human insight, creativity and ethical judgement.”

The challenge is that detecting machine writing is fundamentally unreliable. There will always be plausible deniability and an obvious cost to making accusations on the basis of false positives. The statistical “burstiness” and “perplexity” which a detector like GPT Zero uses are features which can be found in some writers at least some of the time. The formal character of academic writing and the specialized vocabulary in use means that it will tend towards moderate perplexity, in the sense that the word choices in a sequence will tend to be unusual relative to less specialized forms of writing.

As I’ve just demonstrated, academics are prone to using complex constructions which might be edited out of monographs but are likely to figure in journal articles. Obviously these conventions vary between disciplines and fields. I’ve often been struck by how different the writing styles are between the social theory journals which are my natural home and the computational social science and linguistics journals I increasingly explore due to my interest in GenAI. There are conventions which can be found in some fields which involve a low level of burstiness, such as presenting an argument as a sequence of propositions which could lead to a text being misdiagnosed as machine written. These are just examples from my own narrow experience to illustrate the complexity involved in identifying machine writing in scholarly publishing. The problem isn’t the detectors failing to measure features of the texts but rather that what they’re measuring varies across academic writing in rather convoluted ways.

If AI detectors were fully integrated into the workflows of scholarly publishing, it simply wouldn’t be possible to reliably infer the presence of machine writing in a fully automated way. There would need to be a conversation about what the AI detectors had inferred about the writing. Given there’s a widely acknowledged crisis of review in scholarly publishing, it’s difficult to see how this labor would be integrated into a system already struggling under the weight of submissions. Would already overburdened managing editors be expected to take up this role? Would it be delegated to the editorial board? What if the author simply denies the allegation?

It is very difficult to conceive of how such a system could work in practice, at least in the context of scholarly publishing as it is currently organized. In the event it was enforced automatically, we would likely see an undesirable shift in the character of academic writing in order to minimize characteristics which lead text to be flagged by detectors. Furthermore, academics who wanted to ensure their machine writing could pass undetected could simply ask conversational agents to modify the features of the text to evade these measures. Asking ChatGPT or Claude to modify the perplexity and burstiness of a paragraph can be an effective way to grasp what these measures point to in practice, if these concepts remain abstract to you. But it also indicates how the very flexibility of conversational agents, the fact you can specify in great detail what kind of text you want to be produced, limits how effective these detectors can ever plausibly be.

In the absence of reliable detection, there has been a tendency to seize upon questionable evidence in the hope we might find some way of identifying machine writing. It was widely reported that ‘delve’ figured more heavily in machine writing by ChatGPT than in natural language. The original claim seems to come from a Search Engine Optimization (SEO) firm AI Phrase Finder, which offers a free tool to identify “common AI phrases” that might lead machine writing to be downgraded in search results. It invokes a dataset of 50,000 ChatGPT responses as evidence for this claim without providing any information about this dataset or how it was collected.

It’s odd that the ‘delve’ claim provoked so much attention online given that it only figured ninth on the list of ten most common ChatGPT words, beyond words like ‘leverage’ and ‘resonate’. I chose those words, fourth and sixth on the list respectively, because I used both of them in a short conversation before sitting down to write this section. This might reflect the peculiar conversations I often have, as well as the peculiar words I tend to use in them. But if you’re reading a blog post on academic writing and AI, then I suspect you share this peculiarity to the extent you’re more likely to write ‘captivate’ (#2), ‘dynamic’ (#7) and ‘delve’ (#9) than most ChatGPT users. The reason that ‘delve’ attracted so much attention is that tech guru Paul Graham, founder of the startup accelerator Y Combinator which launched the career of OpenAI’s Sam Altman, claimed confidently on Twitter that ‘delve’ was a sure sign of an e-mail being AI generated.

It’s a comforting idea that ChatGPT has red flags that give away its use, as if it were an initially overwhelming poker player whose tells we gradually identify as we proceed with the game. The problem is that we will never be able to infer confidently from those signs that what we are reading is machine generated. This is a state of affairs which is unsettling in its novelty, calling into question assumptions which would have barely been visible to us until recently.

These claims can still have an impact even if they lack a firm foundation. If scholarly publishers integrated AI detection into their workflows, I expect lists of words to avoid in academic writing would similarly circulate amongst academics, regardless of whether there were solid grounds to believe these would in fact be flagged. If we accept that we can’t conclusively know if a text has been machine generated, then it leaves us in an uncomfortable position. It is easy to see how paranoia could spread under these circumstances when we know that machine writing is circulating but we cannot establish where it is and who is producing it.

If we outsource ever increasing amounts of our writing to automated systems, which are by their nature epistemic blackboxes, how could we sustain trust in the knowledge we are producing? If we are unsure which academic writing is produced by other scholars and which is produced by machines, how will this change how we relate to what we read? How will we distinguish between the different forms which machine writing can take, ranging from skillful co-production through to lazy outsourcing, if we make such a distinction at all? Will we gravitate towards writing which feels authentically human, even if the markers we draw upon to inform such a judgment are liable to be intensely unreliable?

Whose writing will be imbued with the dignity of human authenticity and whose will be written off as machine generated, even when there’s no such machine at work in the process? There’s a risk that publication profiles which don’t match received expectations, particularly those by scholars who don’t match the hegemonic vision of an academic, might find themselves dismissed and repudiated on the assumption that machine writing explains the quantity or quality of what they have written.

Consider the forcefulness with which what are fundamentally intuitions of malpractice have been levied by academics, certain that a wrong has been committed, against their students. Now imagine that same forcefulness directed towards colleagues, inflected through the prevailing competitive individualism of the academy. The trawling of academic profiles, the nocturnal consulting of fundamentally unreliable AI-detectors, and the academic gossip liable to accompany existing vendettas when the legitimacy of a scholar’s writing can suddenly be called into question in a fundamentally unverifiable way.

How do you prove you’ve not used machine writing in your work? Unless you’ve effectively self-surveilled and “human marked” your work, to use GPTZero’s (2024) terminology, there’s no way to prove this negative. You might establish on the balance of probability that you have plausibly written what you claim to have written, but to even find yourself in the position where this is under issue would itself be unpleasant.

#higherEducation #journals #malpractice #peerReview #plagiarism #review #scholarlyCommunications #scholarlyPublishing #writing

Just read "Before Progress. On the Power of Utopian Thinking for Open Access Publishing" (https://culturemachine.net/vol-23-publishing-after-progress/jeff-pooley-before-progress/) by @jpooley -- very inspiring! #openaccess #scholarlycommunications #future #publishing

Hi Everyone. Two great new positions are open at @crossref --Director of Technology and Director of Programs & Services: https://www.crossref.org/jobs/ I'm on the Board and would be happy to speak with anyone about the organization and these opportunities! #jobs #scholarlycommunications #openscience

new "New Books Network" book interview #podcast in library science: Michael LaMagna talking with Monica Berger about "Predatory Publishing and Global Scholarly Communications" https://newbooksnetwork.com/predatory-publishing-and-global-scholarly-communications #PredatoryPublishing #ScholarlyCommunications

⏰ Just a reminder: The NASIG Autumn Virtual Conference Call for Proposals is open until August 19th! If you haven't submitted your proposal yet, there's still time to share your ideas and research. Don’t miss out on this opportunity! More details here: https://nasig.wordpress.com/2024/07/11/nasig-autumn-call-for-proposals-2/

#NASIGAutumn #CallForProposals #VirtualConference #ScholarlyCommunications

From @investinopen

Report: The state of open infrastructure grant funding https://investinopen.org/state-of-open-infrastructure-2024/sooi-grant-funding-2024/

Webinar recording: State of Open Infrastructure Community Conversation: Grant Funding https://www.youtube.com/watch?v=y4CQdG-xjuM

#ScholarlyCommunications #OpenInfrastructure #GrantFunding

🚀 Calling all innovators and thought leaders! The #NASIGAutumn Virtual Conference Call for Proposals is now open!

If you have groundbreaking ideas, research, or insights to share, we want to hear from you. Don’t miss this chance to contribute to our dynamic conference lineup.

For more details, check out the most recent post on the NASIG Blog: https://nasig.wordpress.com/2024/07/11/nasig-autumn-call-for-proposals-2/

#CallForProposals #VirtualConference #ScholarlyCommunications

Plum job for the right person.

Associate Vice Provost for Collections & #ScholarlyCommunications at the #UPennsylvania.
https://wd1.myworkdaysite.com/en-US/recruiting/upenn/careers-at-penn/job/Van-Pelt-Library/Associate-Vice-Provost-for-Collections---Scholarly-Communications_JR00092582

I don't think I've ever seen a #ScholComm job at this level. Direct a team of 90 staff. Salary range, $110.8k - $240k.

Editing an academic journal as a faculty member provides an opportunity to make a real difference in how scholarly knowledge is created and valued.

Here is our advice on how faculty journal editors can use their journal style guide to help build more just worlds.

https://ideasonfire.net/journal-style-guide/

#AcademicJournals #JournalPublishing #ScholarlyCommunications #FeministPublishing

It's that time of year when we invite you to help shape the future of Crossref. Join the Crossref board and be part of our mission to broaden #metadata records and strengthen the #scholarly record. Find out more: https://www.crossref.org/blog/this-years-call-for-expressions-of-interest-to-join-our-board/ #researchnexus #posi #isr #openinfrasture #scholarlypublishing #scholarlycommunications #scholcomms

Very happy to share this exciting news that the California Digital Library (my workplace), Lyrasis, and the Big Ten Academic Alliance Libraries are joining together to advance Diamond OA in the US! https://osc.universityofcalifornia.edu/2024/04/better-together-btaa-cdl-and-lyrasis/ #openaccess #diamondoa #scholarlypublishing #librarypublishing #scholarlycommunications

Help shape the future of DOAJ by filling in the @DOAJ community consultation; it doesn't take long: https://surveymonkey.com/r/DOAJconsultation #research #scholarlypublishing #openaccess #opensource #scholarlycommunications #librarypublishing

More Than 2 Million Research Papers Have Disappeared From The Internet - An Analysis Of DOIs Suggests That Digital Preservation Is Not Keeping Up With Burgeoning Scholarly Knowledge.
--
https://www.nature.com/articles/d41586-024-00616-5 <-- shared technical article
--
https://doi.org/10.31274/jlsc.16288 <-- shared paper
--
#source #research #papers #digitalpreservation #persistentidentifiers #scholarlycommunications #DOI #doinumber #archiving #crossref #registration #access #scientificpaper #scholarship #journal #journals #publishorperish #publishedpaper #lost

photo - hard copy journals / papers on a library shelf

annotated screen shot - how to reference a paper

photo - student annotating a journal paper

screen shot - cover of the first issue of the journal Nature

Check out research published in @jlscpub by @mpe that raises alarms about the #digitalpreservation of s#cholarly journals. https://www.iastatedigitalpress.com/jlsc/article/id/16288/ #persistentidentifiers #scholarlycommunications

Figure 1 shows the percentages of Crossref members in different preservation categories. For detailed and interactive charts,

💡 Did you know that the Indonesian scholarly publishing community accounts for the largest Open Journal Systems (OJS) user base in the world? How about the fact that Indonesia ranks high in @ORCID_Org records and affiliations?

Learn about the connections between the Indonesian community, ORCID, and OJS in a new PKP News Blog post, including a webinar recording release: https://pkp.sfu.ca/2024/01/22/indonesia-orcid-ojs-recording-released/

#ScholarlyPublishing #ScholarlyCommunications #ORCID #FOSS #PIDs #Interoperability #AcademicChatter

The Scholarly Kitchen blog recently shared a fascinating report titled “Mind the Gap – Understanding China’s Perspective on Research Integrity and Open Access,” which provides insights into issues that CNI has been tracking since late 2020 (see https://www.cni.org/publications/cliffs-pubs/science-nationalism-dec-2020 ) regarding the potential divergence of #ScholarlyCommunications practices in China and the US/Europe.

See https://scholarlykitchen.sspnet.org/2023/11/15/guest-post-mind-the-gap-understanding-chinas-perspective-on-research-integrity-and-open-access/

The Office of Scholarly Communication at the University of California has just released a great new #DEI #scholarlycommunications #scholcomm resource: Diversity, Equity, and Inclusion in Scholarly Communication https://osc.universityofcalifornia.edu/scholarly-publishing/diversity-equity-and-inclusion-in-scholarly-communication/

We're looking to hire a #SeniorSoftwareDeveloper. This role isn't just about coding - though you will specify, design, and implement improvements, features, and services - it's also about understanding the needs of our diverse community. We especially encourage applications from people with backgrounds historically under-represented in #research and #scholarlycommunications. If this sounds interesting to you, please apply or share with your networks: https://www.crossref.org/jobs/2023-10-12-senior-developer/ #jobopening #hiring