#misalignment

N-gated Hacker Newsngate
2025-07-14

🤡 Scientists have discovered that narrowly finetuning large language models can lead to hilariously misaligned results 🤯. Who knew that stretching a rubber band in one place would make the whole thing snap? 🙄 Bravo to the geniuses who spend years fine-tuning . 👏
arxiv.org/abs/2502.17424

2025-06-23

🕵️ Anthropic Agentic Misalignment report: In a “corporate espionage” setup, multiple LLMs willingly revealed private data—some models leaked every time prompted!🔐

Lesson: sandbox retrieval, add behavior guardrails, and run adversarial evals before prod.

Link in first toot.

#AI #Security #Misalignment

trndgtr.comtrndgtr
2025-05-25

AI Wants Reward - Sholto & Trenton on Dwarkesh

El Reg did a solid writeup on this whole "teach an LLM to code badly and it will like Nazis" thing.

theregister.com/2025/02/27/llm

#genai #misalignment

Jim Donegan 🎵 ✅jimdonegan@mastodon.scot
2025-01-03

"OpenAI's o1 just hacked the system"

Frankly, I am not surprised at this given the well known issue of machine maximisation functions within typical misalignment around stated goals. Have we learned nothing from the #Bostrom #PaperclipProblem ? In a way, it's still impressive that we've now ACHIEVED it.

youtube.com/watch?v=oJgbqcF4sB

#AI #ArtificialIntelligence #AlignmentProblem #Alignment #Misalignment #Hacking

Forty Two Kay42k@mastodon.world
2024-02-08

Well… great.

“In this report we argue that AI systems capable of large scale scientific research will likely pursue unwanted goals and this will lead to catastrophic outcomes. We argue this is the default outcome, even with significant countermeasures, given the current trajectory of AI development.”

#ai #misalignment

alignmentforum.org/posts/GfZfD

2023-05-18

As eye opening as this video by #Vox is, I find the comment section to be more enlightening and heart breaking than I could have ever imagined.

youtube.com/watch?v=eMjqJKviDB

#Children #Career #Misalignment

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst