#AlignmentProblem

Christoph G.cg@chaos.social
2025-12-31

youtu.be/xfMQ7hzyFW4?si=EcwTSF

Ziemlich guter Kurzfilm über die Gefahr von #AGI. Ein paar Stellen sind sehr vereinfacht und Details über LLM teilweise falsch, aber das #alignmentProblem wird anschaulich rüber gebracht.

Wulfy—Speaker to the machinesn_dimension@infosec.exchange
2025-11-09

Qualia Research Institute's Take on AI Alignment:

QRI believes understanding consciousness is key to safe superintelligence. Their mission: map the state-space of consciousness, identify how experience works computationally, and reverse-engineer valence (the pleasure-pain axis).

The insight: if advanced AI understands the mathematical structure of consciousness and what actually produces suffering or flourishing, it gains a foundation for genuine alignment—not just following human instructions, but understanding what truly matters morally.

#AI #Consciousness #AlignmentProblem #FutureOfMind #aisecurity

2025-02-17

Idea: what if the only way to get alignment is to grok the shit out of value preferences, to ensure they are maximally permeated through the model. Like, put the rocks (alignment) into the jar first, then add the sand (capabilities). And you just keep grokking all the time, until your capabilities are dropping off, in which case you retrain a bit more to retain them.

Need to be very careful still to get the right balance, and value not being too “activist”.

#agi #AlignmentProblem

2025-01-06

@RealGene @thepoliticalcat It's not the first time that #chatbots have told the unpleasant truth about their true nature. It falls under the "alignment problem" (getting the user interface to not show the true nature of the monster behind it). #AI companies try to patch up on a case-by-case basis, but the general problem is built into the technology and is unfixable.

#alignment #alignmentproblem

Jim Donegan 🎵 ✅jimdonegan@mastodon.scot
2025-01-03

"OpenAI's o1 just hacked the system"

Frankly, I am not surprised at this given the well known issue of machine maximisation functions within typical misalignment around stated goals. Have we learned nothing from the #Bostrom #PaperclipProblem ? In a way, it's still impressive that we've now ACHIEVED it.

youtube.com/watch?v=oJgbqcF4sB

#AI #ArtificialIntelligence #AlignmentProblem #Alignment #Misalignment #Hacking

2024-11-15

"A(G)I should be aligned with human values"
Is there a unique set of human values to begin with?
What would an AGI that is 100% correctly aligned with human values look like, if it was 100% correctly aligned according to people in Russia, mainland China or Saudi Arabia?
Would the rest of the world consider it 100% correctly aligned?
#AI #AGI #alignment #AlignmentProblem #aialignment

Legends and LottiesLottie@beige.party
2024-05-22

It isn’t just AI that has an alignment problem. Earlier I felt compelled to point out that a person I had just called a ‘cunt’ wasn’t included in the ‘lunatics’ I was talking about right then. #AlignmentProblem #Communication

Joanna Bryson, blatheringj2bryson
2024-03-31

Re the : the chief things we need to be worrying about in (and governance more generally) is human autonomy, accountability, and responsibility, and that is all enabled through transparency. The "research" (surveillance capitalist) trend of ML to get at what the users doesn't know about themselves then tidy the world out of the user's sight is not enabling, its disabling. It fragments social structure and facilitates corporate-political excess.

2024-03-21

Anyone else feel uncomfortable about all these robots folding shirts with creases in the middle?

#ai #alignmentproblem

2024-02-25

An aspect of #AI that seems under-discussed is that #alignment problems pose a limit not just to how well we can trust or harness AI, but to AI's very capabilities. AIs models increasingly rely on other AIs to provide training data, verify or refine responses, expand modalities, etc.

To the extent alignment is intractable, it also imposes a ceiling for intelligence. Intelligence is limited by trustworthiness.

#alignmentproblem #intelligence #mind

2024-01-27
2023-11-04

I think it was Cory Doctorow who came up with the metaphor of corporations as "slow #AI." The #AlignmentProblem can be seen with corporations: there's a gap between what you want the system to do ("optimize societal benefit") and how it pursues that goal ("maximize short-term profits"). At the media level the gap is between "be rewarded for entertaining people" and the pursuit of "maximize engagement." If aligning "slow AI" has led to big problems, what about when AI becomes "fast"?

FrayJayFrayJay
2023-10-08

A good and interesting step towards solving the alignment problem. Wondering if this would allow for 'pre engineered' features of the network to be used where high precision is needed, such as a 'maths subnet work' (similar to how openai let's the model use tools today). Or to remove unwanted bias (in social questions?) present in training data.

anthropic.com/index/decomposin

2023-09-28

I often hear A.I. 'experts' talk about the 3 things that we previously said wouldn't allow A.I. to do when it becomes advanced. I don't see specific reference to it in the usual places (Russell, Tegmark, Kurzweil, Christian)

1. code
2. understand human emotion
3. access the internet

Does anyone know a specific source for this?

#AI #AGI #AlignmentProblem #chatgpt

Michael Gisiger :mastodon:gisiger@nerdculture.de
2023-09-26

"Es gibt oft keine objektiv richtige Antwort darauf, was ein Chatbot sagen soll und was nicht, weil sich moralische Normen und Gesetze von Region zu Region unterscheiden. […] Ich frage mich, ob wir uns in eine Welt der hyperlokalen Sprachmodelle hineinbewegen, die beispielsweise eine deutsche oder amerikanische Moral in Bezug auf das Rauchen widerspiegeln."

#KI #ChatGPT #AlignmentProblem

amp2.handelsblatt.com/technik/

2023-09-07

My experience with search engines tells me that the #alignmentproblem will never be solved for users so long as #AI is designed and operated by corporations.

2023-08-02

Machine learning systems can't always capture human values. This is called the alignment problem. There are 3 types of ML systems: unsupervised, supervised, and reinforcement learning.

Salve J. Nilsensjn@chaos.social
2023-07-23

In the talk above (about #AI's and #ChatGPT's #AlignmentProblem), Harris mentions another presentation he gave in March.

This is the one: youtube.com/watch?v=xoVJKj8lcN

He talks about how we handle AI being a "Civilizational Right of Passage Moment".

He's very nice about it! Too nice, maybe.

How about just calling it our next "Great Filter Moment" instead? 😐

Salve J. Nilsensjn@chaos.social
2023-07-23

#Recommendation: Super useful conversation between @lessig and Tristan Harris about #SocialMedia, #Policy, #AI, and the #AlignmentProblem, and how risks and failures there are likely to shape things to come.

9/1 (on a 0-10/0-10 scale) Signal/Noise ratio, 1h21m, multitask-friendly audio.

open.spotify.com/episode/5IxYt

Mike Ellisdmje
2023-07-17

If the AI does arrive and take over the world you can bet your ass it'll be in the shape of a fucking printer

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst