#detectors

2025-10-13

Has there been any improvement in AI detectors over the last 12 months? A GPT 5 Pro literature review

I ran this report to support myself in exploring whether my 2023/24 arguments from Generative AI for Academics still hold. Shared here because other people might find this useful.

TL;DR: Over the last 12 months, the centre of gravity has moved further away from “catch‑and‑punish” AI detection toward assessment redesign, process evidence, and transparency. Regulators (e.g., TEQSA in Australia) now say reliable detection “is all but impossible,” several universities have disabled AI detection features, and independent guidance for instructors argues detectors don’t work well enough to be relied on. Vendors keep publishing high headline accuracy numbers, and there’s active research on watermarking and authorship verification—but none of this has translated into a dependable, classroom‑safe detector for typical student writing. teqsa.gov.au+2Inside Higher Ed+2

From the “assessment panic” to a new normal

In the book we described the 2023–24 “great assessment panic”—a rush to outsource academic judgment to detectors in hopes of restoring order overnight. The last year shows that order won’t be restored by a score. What is emerging instead is a culture shift: instructors accept that students will use GenAI, and programmes re‑emphasize authentic tasks, process artefacts, and viva/oral components to evidence learning.

That turn is also consistent with your broader argument about not over‑automating interpersonal judgment: even where automation looks tempting, we should beware brittle tools that offload risk onto students and staff. Detection has become the latest test case for that etiquette.

What changed in 2024–25

1) Policy and regulator signals hardened

  • TEQSA (Australia) now says “detecting gen AI use with certainty in assessments is, at this point, all but impossible” and urges providers to build in at least one secure assessment per unit and redesign tasks rather than lean on detectors. teqsa.gov.au
  • UK sector bodies (e.g., QAA) continued to steer institutions toward reconsidering assessment and away from simplistic technological fixes. qaa.ac.uk
  • Inside Higher Ed reporting in early 2024 captured the mood among many faculty: proceed with caution; detectors risk more harm than good. Inside Higher Ed

2) Universities kept switching off or downgrading detectors

  • University of Waterloo announced it is discontinuing Turnitin’s AI detection feature in 2025. University of Waterloo
  • UMass Amherst deactivated Turnitin’s AI detection feature; its teaching centre cites significant limitations in GenAI detectors. UMass Amherst
  • In Australia, ACU paused Turnitin’s indicator after false accusations and lengthy investigations; other institutions signalled similar moves. staff.acu.edu.au+1

3) Vendors continued strong claims; independent guidance stayed sceptical

  • Major tools (GPTZero, Originality.AI, Turnitin) publicised ~98–99% accuracy on their own benchmarks. But instructor‑facing guidance at MIT Sloan EdTech and elsewhere remains clear: don’t rely on detectors as proof. MIT Sloan Teaching+3GPTZero+3Originality.ai+3

4) The equity problem didn’t go away

  • Peer‑reviewed work and round‑ups reiterated that detectors mislabel non‑native English writing at higher rates; that bias remained a practical and ethical blocker. Stanford HAI+1

Has AI detection gotten more reliable?

Short answer: not in the way that matters for day‑to‑day teaching.

  • Whole‑cloth, unedited AI essays can often be flagged by multiple tools. But the real world is hybrid (drafts revised by students, paraphrased, or partly AI‑assisted), where accuracies drop sharply and error profiles become unacceptable for high‑stakes use. Stress‑testing papers show attacks like light paraphrasing reliably break detectors—including watermark‑based ones. arXiv
  • Even in 2025, sector guidance aimed at instructors (MIT; IHE; many CTLs) continues to caution that detectors “don’t work” as evidence, and OpenAI’s own AI text classifier remains discontinued for low accuracy. MIT Sloan Teaching+2Inside Higher Ed+2
  • Institutional risk has become clearer: ACU’s experience (and student protests at Buffalo) illustrates how a single high score can trigger disproportionate harm and erode trust. Adelaide Now+1

Bottom line: There’s no compelling evidence of a step‑change in detector reliability over the last year for typical coursework (short, hybrid, multi‑draft, multilingual). What has changed is policy clarity: “use with caution—never as sole evidence.”

What’s promising (but not there yet): provenance & watermarking

If there’s progress, it’s more on provenance than on retroactive text detection.

  • C2PA / Content Credentials is gaining adoption across media platforms and tooling (Adobe, Google, Cloudflare, OpenAI joined the steering committee). This helps with images, audio and video by attaching verifiable “nutrition labels,” but text remains tricky—most university workflows strip or transform metadata, and text models rarely ship with robust, persistent marks. The Verge+3Content Authenticity Initiative+3C2PA+3
  • Watermarking research in 2024 (e.g., Nature paper on scalable schemes) is encouraging, but isn’t widely deployed for text in mainstream LLMs and can be defeated by editing/paraphrase. Nature

Implication: Expect forward‑looking provenance (label at creation) to matter more than after‑the‑fact detection. That helps in journalism and platform governance; it’s much less helpful when grading a student draft pasted from an unknown source.

Where detectors might fit (narrow, safeguarded use)

If your institution still exposes an AI score, keep it in a triage role only:

  1. Corroboration, not causation: Treat any score as a lead to investigate alongside process evidence (draft history, notes, references, code commits). Never as proof. MIT Sloan Teaching
  2. Due process & equity: Document how a concern is raised (patterns across drafts, mismatched level with prior work), offer student reflection opportunities, and avoid one‑off scores—especially for non‑native writers. Stanford HAI
  3. Opt‑in authorship verification: If you adopt stylometry/“authorship” tools, use them with consent and for portfolio baselining over time, not to police isolated assignments. The field is evolving; explainability is better than black‑box “AI %” scores, but it’s not a silver bullet. PMC

This aligns with your book’s argument to use GAI as a meta‑collaborator for process (drafts, notes, meeting digests) rather than as a shortcut to verdicts. Building a traceable writing process—version histories, short oral defenses, design logs—yields better authorship evidence than any detector.

What universities actually did instead (2024–25)

  • Assessment redesign: TEQSA urges at least one secure/authentic task per unit (in‑class writing, supervised practicals, viva). UK QAA and many CTLs stress process‑rich tasks and staged submissions. teqsa.gov.au+1
  • Clear student communications: Institutions published guidance discouraging reliance on detectors and explaining acceptable AI use; several disabled AI scoring in Turnitin outright. UMass Amherst+1
  • Case handling reforms: After false positives, some universities tightened evidence standards and timelines to reduce harm. (See ACU’s March 2025 changes.) staff.acu.edu.au

Why the consensus calls detection a “dead end” (for now)

  1. Adversarial drift: Generators improve faster than detectors; light paraphrase or hybrid editing defeats many heuristics. arXiv
  2. Context mismatch: Benchmarks seldom reflect short, multilingual, hybrid student writing; vendors’ headline metrics don’t generalize to real courses. Inside Higher Ed
  3. Ethical and legal exposure: Bias against non‑native writers, unclear thresholds, and opaque models make high‑stakes decisions indefensible. Stanford HAI
  4. Better alternatives exist: Redesigning assessment and teaching works with GenAI rather than pretending it can be perfectly policed—exactly the direction we explored under your “communication and etiquette” lens.

A practical playbook (what we recommend now)

  • Name acceptable use in each assessment (permitted tools, disclosure expectations, citation of AI assistance).
  • Require process artefacts: outlines, reading logs, draft‑to‑final diffs, short audio reflections, and targeted viva where appropriate.
  • Shift weight to authentic tasks: data labs, site‑specific briefs, oral explanations of decisions, micro‑presentations.
  • If detection is visible on your campus, write it down: “AI scores are used for triage only, never as sole evidence.”
  • Invest in staff support for GenAI‑positive pedagogy; prefer formative uses of GAI (clarify goals, rehearse arguments) over punitive tooling. This builds the professional culture your book argues for—augment the human, don’t outsource judgment.

What to watch next

  • Provenance stack maturity (C2PA, Content Credentials) across more platforms and LMS integrations. Content Authenticity Initiative+1
  • Model‑side watermarking research; any mainstream text model that ships with robust, edit‑resistant marks would change the detection calculus—but that’s speculative today. Nature
  • Authorship verification as an opt‑in portfolio capability, not a gotcha detector. Evidence‑informed, explainable methods may help students reflect on their writing while protecting equity. PMC

Bottom line

The weight of evidence this year backs your intuition: AI detection has not become reliably trustworthy for routine academic integrity decisions. The consensus hasn’t just declared it a dead end—it has pivoted toward making assessment resilient to AI rather than attempting to police it perfectly. That shift fits the values you map throughout the project: keep the human at the centre, build better processes, and treat GAI as a tool for learning—not a trap for students.

References & further reading (selection)

  • TEQSA, Enacting assessment reform in a time of AI (Sept 2025): “detecting gen AI use with certainty… is all but impossible.” teqsa.gov.au
  • MIT Sloan EdTech: AI detectors don’t work—what to do instead. MIT Sloan Teaching
  • Inside Higher Ed: Professors proceed with caution using AI-detection tools (Feb 2024). Inside Higher Ed
  • OpenAI: retired its AI text classifier for low accuracy (July 2023). OpenAI
  • Stanford HAI: AI detectors biased against non‑native English writers (2023; still cited in 2024–25 practice). Stanford HAI
  • Nature (2024): scalable watermarking for LLMs (research outlook). Nature
  • QAA: sector advice on reconsidering assessment in the ChatGPT era. qaa.ac.uk

#AIDetection #assesmentIntegrity #detectors #GPT5Pro

Katharine O'Moore-Klopf, ELSKOKEdit
2025-05-31

Are (AI) ? What can happen when they’re not? Many universities are using them to try to ascertain whether students are having programs write their classroom assignments for them. cbsnews.com/video/colleges-try

MPI für RadioastronomieMPIfR_Bonn@astrodon.social
2025-03-04

Three weeks ago, the scientific journal Nature @nature reported the discovery of the most energetic #neutrino ever observed. Its energy is 16,000 times greater than the strongest particle collisions created by the Large Hadron Collider and corresponds to 30 times the energy needed to press a computer key.

The neutrino was discovered in an underwater observatory in the Mediterranean, one of three neutrino detectors in water - two in the Mediterranean and one at Lake Baikal. At the geographic South Pole, there is the #IceCube neutrino detector under the ice. Other detectors exist underground in China, Italy, and Japan.

All these #detectors are not located on the Earth's surface because the Earth itself acts like a #telescope for neutrinos. Neutrinos are extremely light, electrically neutral particles that interact very weakly with matter and pass through the Earth. When they collide with atomic nuclei, charged particles are produced that move faster than light in water or ice, emitting blue light that is captured.

Water and ice are ideal media for detecting neutrinos because they provide large volumes to detect these particles while shielding against cosmic radiation and other disturbances. IceCube even utilizes 1 cubic kilometer of ice.

Neutrinos are the second most abundant particles in the universe, after photons, but are difficult to study because they interact so little with matter. Interestingly, dark matter and dark energy, which make up 95% of the universe, also interact very weakly with normal matter, while the remaining 5% consists of elements like #hydrogen and #helium, of which only 0.5% is visible matter (such as #stars).

2025-02-13

"On Wednesday, a team of researchers announced that they got extremely lucky. The team is building a detector on the floor of the Mediterranean Sea that can identify those rare occasions when a neutrino happens to interact with the seawater nearby. And while the detector was only 10 percent of the size it will be on completion, it managed to pick up the most energetic neutrino ever detected."

arstechnica.com/science/2025/0

#Neutrinos #Detectors #ParticlePhysics

Claudio Piresclaudiocamposp
2025-01-22

What Are AI Detectors? A Ultimate Guide visualmodo.com/what-are-ai-det 🤖🕵️‍♀️🔍

2024-10-02

[The “JENI” of #JUICE 🤩] There are #detectors that capture the invisible: #atoms emitted by energetic ions trapped in the Earth's #magnetosphere, in this case. Then comes this image of the hot #plasma surrounding our #Earth ...

This image was captured by NASA's #JENI (Jovian Energetic Neutrals and Ions) instrument, to which IRAP contributed, as the #JUICE probe moved away from Earth, last August. This is the clearest image yet of the #Earth's #radiation belts

Details+: irap.omp.eu/en/2024/10/juice-p

Stephen Brooks 🦆sjb@mstdn.io
2024-09-02

Handling an aerogel used in a particle detector.
#physics #accelerators #detectors
[video]
RT x.com/kek_ipns/status/18304042

Stephen Brooks 🦆sjb@mstdn.io
2024-07-14

Pictures from entering SuperKamiokaNDE, starting from the condensation around the cold tunnel entrance.
#physics #detectors #neutrino
RT x.com/guruguruuzumaki/status/1

Rovedar Publicationrovedar
2023-12-28

Best AI Detectors
(AI) detectors have become indispensable in numerous domains, aiding in moderation, fraud detection, cybersecurity, and more. With a plethora of options available, it’s essential to compare the features, capabilities, and limitations of both free and premium across different categories to assist users in selecting the most suitable tool for their needs.

ps.rovedar.com/best-ai-detecto

2023-11-02

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst