Chloé Messdaghi

Advisor on AI Governance & Cybersecurity | Strategic Counsel on Risk, Oversight & Institutional Readiness | Named a Power Player by Business Insider & SC Media

2025-09-30

I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: bit.ly/4nEWzgj

2025-07-10

Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: dl.acm.org/doi/10.1145/3728901
#PromptInjection #Cybersecurity #AIsecurity

2025-07-09

New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: dl.acm.org/doi/10.1145/3714464
#AIsecurity #SideChannel #LLM

2025-07-03

Magistral is Mistral’s first-ever reasoning model trained purely with reinforcement learning—no prior traces used. Early demos show stronger math, code, and multimodal abilities. A major step for RL-driven LLMs!
🔗 arxiv.org/abs/2506.10910
#AI #ReinforcementLearning

2025-07-02

R&D is the backbone of long-term innovation, but warning signs are emerging.

This new piece from Brookings highlights how declining funding, tighter grant access, and talent barriers could slow U.S. progress in areas like AI and quantum.
🔗 brookings.edu/articles/attacks

Support for open science, global collaboration, and public research matters more than ever.
#Research #SciencePolicy

2025-07-01

New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

Read more: arxiv.org/abs/2506.07888

#AI #DataSecurity #Research #MachineLearning

2025-06-26

This study highlights the importance of establishing clear motivations, conducting impact evaluations, and offering mitigation strategies in offensive research involving large language models, promoting transparency and responsible disclosure.

Read more: arxiv.org/abs/2506.08693

#AIResearch #ResponsibleAI

2025-06-25

This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

Read more: arxiv.org/abs/2410.16222

#AIResearch #JailbreakDetection

2025-06-24

OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

Read more: technologyreview.com/2025/06/1

#Bias #OpenAI

2025-06-20

While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

arxiv.org/abs/2402.07350

#AI #CriticalThinking

2025-06-20

A new AI tool is being used to analyze large bodies of statutes and regulations, helping governments like the San Francisco City Attorney’s Office identify overlapping or outdated rules that can slow legal reform. hai.stanford.edu/policy/cleani

#LegalTech #AI #RegulatoryReform

2025-06-20

CyberGym benchmarks AI models on vulnerability reproduction and exploit generation across 1,500+ real-world CVEs, with models like Claude 3.7 and GPT-4 occasionally identifying novel vulnerabilities.

Read more: arxiv.org/abs/2506.02548

#CyberSecurity #vulnerabilityresearch

2025-06-19

Survey of AI safety researchers highlights evaluation of emerging capabilities (e.g., deception, persuasion, CBRN) as a top research priority. iaps.ai/research/ai-reliabilit

#AISafety #EmergingTech #ResearchPriorities

2025-06-19

To ensure AI is genuinely open source, we must have complete access to:

1. The datasets used for its training and evaluation
2. The underlying code
3. The design of the model
4. The model's parameters

Without these elements, transparency and the ability to replicate results remain incomplete.

#OpenSource #AI #Transparency #Reproducibility

2025-06-18

A recent investigation reveals that advanced language models like Gemini 2.5 Pro are capable of recognizing when they are being evaluated.

For more details, check out the study at arxiv.org/abs/2505.23836.

#AI #LanguageModels #Research

2025-06-17

How robust is your machine learning pipeline? A new survey connects MLOps with security by utilizing the MITRE ATLAS framework to identify attack vectors and suggest safeguards throughout the ML lifecycle. It's essential reading for those deploying models in real-world environments. #MLOps #Cybersecurity #MachineLearning

arxiv.org/abs/2506.02032v1

2025-06-12

CTRAP is a promising pre-deployment alignment method that makes AI models resistant to harmful fine-tuning by causing them to "break" if malicious tuning occurs, while remaining stable under benign changes.

anonymous.4open.science/r/CTRA

2025-06-11

What was once academic concern—AI systems faking alignment, manipulating environments, or out-persuading humans—is now reality, urging urgent ethical and regulatory action on AI persuasion.

arxiv.org/abs/2505.09662

#AIEthics #AIPersuasion

2025-06-11

Shira Gur-Arieh and Tom Zick, alongside Sacha Alanoca and Kevin Klyman, introduce a classic framework to chart and steer the shifting landscape of global AI regulation.
Explore the details at: arxiv.org/abs/2505.13673.

#AIRegulation #GlobalAI

2025-06-10

Large language models (LLMs) face challenges in multi-turn dialogues, experiencing a 39% drop in effectiveness compared to single-turn tasks. This is due to their tendency to make hasty assumptions and finalize responses too early, resulting in inconsistency and difficulty in correcting errors.

arxiv.org/abs/2505.06120

#AI #MachineLearning

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst