Jacy Reese Anthis :verified:

Humanity will soon coexist with a new class of beings. I research the rise of these "digital minds." ML/HCI/sociology/stats at the Sentience Institute (co-founder), Stanford University (Visiting Scholar), and the University of Chicago (PhD candidate). #ML #MachineLearning #HumanComputerInteraction #HCI #Sociology #Statistics #Academia #PhD

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-22

You've heard of "AI red teaming" frontier LLMs, but what is it? Does it work? Who benefits?

Questions for our #CSCW2024 workshop! The team includes OpenAI's red team lead, Lama Ahmad, and Microsoft's, Ram Shankar Siva Kumar.

Cite paper: arxiv.org/abs/2407.07786
Apply to join: bit.ly/airedteam

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-18

Our new preprint shows the first detailed public opinion data on digital sentience:

76% agree torturing sentient AIs is wrong;
69% support a ban on sentient AI;
63% support a ban on AGI; and
a median forecast of 5 years to sentient AI and only 2 to AGI! arxiv.org/abs/2407.08867

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

The key point is: a lot of people are just too optimistic about AI ethics and safety right now. However, there is a ton of surface area for more contextualized, adaptive approaches! You can read our HEAL #CHI2024 paper on ArXiv: arxiv.org/abs/2406.03198 We hope you find it useful!

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

Moreover, AI-assisted alignment may be the only path to long-term success. We conclude our big-picture discussion with implications for specific LLM practices: curating training data, instruction tuning, prompt engineering, personalization, and interpretability. (Section 5.2)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

So are we morally doomed? Not quite! Our preprint dashes hopes for a silver bullet of AI ethics or safety, but the case for incremental fairness remains strong! We argue 3 principles: focus on context, hold LLM developers responsible, and iterate with stakeholders. (Section 5.1)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

But, you reply, at least we can enforce fairness in individual cases (e.g., sanitized datasets for each task) and combine those models into a general-purpose AI system! Unfortunately, as Dwork and Ilvento (2019) showed quite explicitly, fairness does not compose. (Section 4.3)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

Worse, every LLM has a multitude of sensitive attributes at play. There are no robust techniques to excise even one from a dataset—much less all of them—and "unbiasing" for some tasks would remove essential information for other tasks like medical prediction. (Section 4.2)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

What about "group fairness" (e.g., group parity, a hiring decision is uncorrelated with race, gender, disability, etc.)? No luck. Again, with general-purpose AI, fairness cannot be guaranteed across populations, and LLMs have no explicit target: city, industry, etc. (Section 4.1)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

Recommendation systems scholars define fairness as equity between stakeholders, such as content creators. But if OpenAI/Google could consume the internet and serve it up with an LLM instead of redirecting to third parties, producers may never get their fair share! (Section 3.2)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

With ML models like for sentencing criminals or hiring job applicants, you might impose a constraint like "fairness through unawareness" (e.g., your model doesn't take race/gender as input), but not with LLMs or any general-purpose model built on unstructured data. (Section 3.1)

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-07-10

Many speak idyllically about a world with "fair" or "unbiased" LLMs, but is that even possible? In our new preprint, we take the most well-defined principle of AI safety/ethics and show, in reality, an LLM could never be fair under any definition in the current ML literature. [Brief thread:👇; paper link: arxiv.org/abs/2406.03198]

Screenshot of first page of PDF. See https://arxiv.org/abs/2406.03198
Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-06-23

AI risks require us to specify our values well enough to be implemented with AI. Causality is a promising way to formalize both risks and values. Eventually, we could have rigorous "pipelines" like this from specific, real-world data to values that benefit all sentient beings.

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-06-23

Second, we develop a correspondence between CF and standard fairness metrics by using d-separation in the causal context. We build 3 graphs: measurement error and selection on label/predictors. In each, we show CF equals group parity, equalized odds, or calibration, respectively!

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-06-23

First, many say fairness must come at a cost of accuracy—an inevitable trade-off—but we rebut with a new motivation for counterfactual fairness. In plausible "causal contexts," CF is actually optimal in terms of accuracy in an aspirational unbiased target domain. We can get both!

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2024-06-23

It seems a fair AI would treat you the same if you had a different race/gender/disability/etc., but how can we ever test counterfactual fairness? In #NeurIPS2023 w Victor Veitch, we show you sometimes can with simple, observed metrics like group parity! 🧵 arxiv.org/abs/2310.19691

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2023-05-12

I discussed digital minds, AI rights, and mesa-optimizers with
Annie Lowrey at The Atlantic. Humanity's treatment of animals does not bode well for how AIs will treat us or how we will treat sentient AIs. In our 2021 AIMS survey, we found 58% of people support a ban on developing sentient AI, and 75% think sentient AIs deserve to be treated with respect. Moral circle expansion is the most important project in human history, and the invention of AGI will be its watershed. theatlantic.com/ideas/archive/

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2023-04-14

You can't tell from Twitter, but it's possible and indeed reasonable to hold these 4 beliefs at once

- Current AI harm is important.
- Existential risk is important.
- AI isn't as powerful or rapidly progressing as many say.
- AI will radically change the world in our lifetimes.

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2023-03-24

My new essay in The Hill argues that we need an AI rights movement. AIs are no longer just tools. They are quickly becoming digital minds integrated in society as friends and coworkers. The future turns on whether and how we include them in the moral circle. thehill.com/opinion/cybersecur

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2023-03-08

Have you wondered if ChatGPT is sentient? What would it take for a chatbot to have meaning and language? What would a future look like in which we've uploaded our brains?

My new blog post lays out key questions in this new AI safety research field of "digital minds" that we're trying to build at Sentience Institute (@SentienceInstitute), and we're eager for feedback as we start working on a full research agenda. What questions in this area seem most interesting to you? 🤔 sentienceinstitute.org/blog/ke

Jacy Reese Anthis :verified:jacyanthis@sciences.social
2023-02-11

Social systems are staggeringly complex and may only be well-understood with AGI. Today we can only build a few stories of scaffolding for the sky-piercing tower of knowledge that AGI will build, but that may be essential for safety and structural integrity as the tower rises.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst