#pcpablation

Research Network Digi-Oek.chDigiOekCH@social.tchncs.de
2023-05-02

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models

#ResearchHighlights

"We can successfully insert a weak backdoor mechanism in the benign model, even without also editing the embeddings of the trigger words."

"Our framework can reverse-engineer backdoor mechanisms in toy and large models for the first time, scale the strength of the backdoor mechanism ..."

arxiv.org/abs/2302.12461

#ai #llm #pcpablation #mlp #toymodel #largemodel #backdoor #backdooredlanguagemodel #chatgpt

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst