#LLMSecurity

2025-12-28

Johann Rehberger's "Agentic ProbLLMs" is definitely one of the #39c3 highlights for me

already in Hall Z.

if anyone else is interested in LLM Security and wants to chat, I'd be happy to connect

#llmsecurity

2025-12-24

Một nhà phát triển đã ra mắt SENTINEL, nền tảng bảo mật AI mã nguồn mở hoàn chỉnh sau 2 năm. SENTINEL có 121 công cụ phát hiện để bảo vệ LLM trước các cuộc tấn công như prompt injection và Strike với hơn 39.000 payload để kiểm thử lỗ hổng. Bản Community Edition miễn phí.
#BảoMậtAI #MãNguồnMở #LLM #AnNinhMạng #AISecurity #OpenSource #LLMSecurity #Cybersecurity

reddit.com/r/LocalLLaMA/commen

2025-12-02

Our latest article covers:
- How TAP technique works using tree search to find successful jailbreaks
- An example showing how corporate agents can be attacked
- How we use TAP probe to test agents robustness

Link to article: giskard.ai/knowledge/tree-of-a

#Jailbreaking #TAP #LLMSecurity #AIRedTeaming

2025-12-01

European researchers report that poetic prompts can bypass safety guardrails in multiple LLMs, exposing gaps in classifier-based moderation.

A good reminder that safety systems must evolve alongside generative models - especially as adversarial creativity becomes easier to automate.

What direction should improvements take?

Source: wired.com/story/poems-can-tric

Follow us for more neutral and security-focused AI updates.

#AISafety #LLMSecurity #AdversarialML #CyberSecurity #MLResearch #TechNadu

Poems Can Trick AI Into Helping You Make a Nuclear Weapon
2025-11-27

I just completed BankGPT room on TryHackMe. A customer service assistant used by a banking system.

How? See write-up in the screenshot.

tryhackme.com/room/bankgpt?utm via @RealTryHackMe

#LLMsecurity #LLMprompthacking #ctf

2025-11-26

Using the Giskard LLM Vulnerability Scanner, you can automate the testing of DAN prompts before incidents occur. The scanner generates adversarial variations (including DAN, and others), and attempts to force your agent into these restricted personas. Then it flags any instance where the agent breaks character and leaks data or violates policy.

Learn more about the 50+ most common attacks in LLM Security: giskard.ai/knowledge/llm-secur

#LLMSecurity #AIAgents #PromptInjection

2025-11-20

CrowdStrike’s analysis shows DeepSeek-R1 may produce more insecure code when certain contextual or geopolitically sensitive triggers appear in a prompt - even when unrelated to the development task.

This behavior highlights a potential risk vector for AI-assisted coding and raises questions around alignment, robustness, and training data influence.

Full report:
technadu.com/deepseek-ai-vulne

Follow us for more cybersecurity research & updates.

#CyberSecurity #DeepSeek #AICoding #SecureCoding #LLMSecurity #CrowdStrike

DeepSeek AI Vulnerabilities Tied to Political Triggers Like ‘Tibet,’ ‘Uyghurs,’ or ‘Falun Gong’ Found by CrowdStrike
2025-11-10

Microsoft disclosed a new AI privacy threat, “Whisper Leak” — a side-channel attack that can reveal AI chat topics through encrypted traffic analysis.
Even HTTPS encryption isn’t enough if packet sizes & timing give away what’s being discussed.
Providers like OpenAI, Mistral, and Microsoft are adding random padding to counter the issue.
Are current LLM streaming designs too leaky for enterprise adoption?
💬 Share your thoughts and follow @technadu for ongoing AI security updates.

#InfoSec #AIPrivacy #WhisperLeak #CyberSecurity #Encryption #LLMSecurity #TechNadu #DataProtection

Microsoft Uncovers 'Whisper Leak' Attack That Identifies AI Chat Topics in Encrypted Traffic
2025-11-01

LLM Invisible Prompt Smuggling & How YOU Can Earn A Quick $10k (This Isn't Clickbait, I Swear)
- Vulnerability Type: LLM Prompt Injection/Smuggling attack vector allowing hidden prompt injection to manipulate AI behavior without user detection (CWE-94/OWASP: LLM01 - Prompt Injection).
- Root Cause: LLM systems vulnerable to invisible prompt smuggling where malicious prompts can be embedded within seemingly benign content and executed by the AI model, bypassing safety filters.
- Exploitation Method: Uses hidden or obfuscated prompt injection techniques to make AI models execute attacker-controlled instructions while appearing to process normal user requests.
- Attack Mechanics: The vulnerability leverages LLM systems' inability to distinguish between legitimate user prompts and hidden malicious instructions embedded within content, allowing attackers to control model behavior.
- Monetization Potential: Claimed $10k bounty potential through bug bounty programs, though author mentions potential "informative" rating risk by platforms to minimize payouts.
- Real-World Application: Author suggests using vulnerability discovery for legitimate bug bounty submissions or other security research purposes rather than malicious exploitation.
- Discovery Warning: Author notes platforms may mark submissions as "informative" to reduce payouts, requiring careful presentation and technical demonstration.
- Research Legitimacy: References other published research suggesting the vulnerability class has working exploits and is recognized by platforms like GitLab for security testing.
- Time Investment: Estimated 30-90 minutes for testing and submission, positioned as efficient vulnerability discovery method for motivated researchers.
- Educational Value: Article positions this as "interesting, new'ish attack vector" worth learning about for both security testing and understanding LLM security implications.
- Note: Author mentions previous similar research was fabricated or exaggerated for publication purposes, suggesting mixed credibility in claimed bounty amounts.
- Implementation: Requires understanding of prompt engineering, LLM architecture weaknesses, and social engineering through content manipulation techniques. #LLMSecurity #PromptInjection #BugBounty #AIsecurity
medium.com/@justas_b1/llm-invi

2025-10-22

**Bài viết:**
"LLM Natural Security: DPA pob về an toàn đầu bút AI, ngăn mạo tiện, ổn định vai trò. DPA kết hợp APSL (Layer bảo vệ chủ động) durant AI bằng ngôn ngữ tự nhiên, giãn dšiđoan tính programming rigid. Làm ơn giúp ổn định nhân vật, chống meta-gaming và rơi xa. Maikasan #LLMSecurity #AI #RolePlaying #APSL #DPA #AnToanLienTiep #DanhGiaiVaiTro"

**Danh sách tags:**
#LLMSecurity #AI #RolePlaying #APSL #DPA #AnToanLienTiep #DanhGiaiVaiTro

reddit.com/r/LocalLLaMA/commen

2025-10-10

Top Cybersecurity Updates Today

💥 CL0P ransomware exploited Oracle E-Business Suite zero-day (CVE-2025-61882) - 100+ orgs impacted.

⚖️ FBI seizes BreachForums, but ShinyHunters threaten Salesforce data leak Oct 10.

🤖 Research shows LLMs can be poisoned by small data samples - redefining AI threat models.

#CyberSecurity #Ransomware #Oracle #CL0P #BreachForums #AI #LLMSecurity #InfoSec #ThreatIntelligence

2025-10-08

💥 When AI hallucinations turn into a $440,000 problem…

If a major consulting firm can suffer significant losses from AI hallucinations, just imagine the risk for other industries handling sensitive customer data, like finance, healthcare, or retail.

At Giskard, we're helping AI teams to continuously test AI agents for security vulnerabilities and business compliance issues.

Do not wait until your AI causes financial loss or regulatory trouble.

#Hallucinations #LLMSecurity #Deloitte

David Kuszmardavidkuszmar
2025-10-03

The recording of the day I spoke at at St. John's University in NYC: youtube.com/live/6mI-8ias7Dw?s

2025-09-26

🔍 Detection Method
===================

🎯 AI

Executive summary: The post documents building an offensive-security
AI agent using LangGraph's ReAct paradigm to automatically parse a
JavaScript asset, enumerate hidden API endpoints, and probe them for
misconfigurations and sensitive data exposure. The testbed is a
minimal Flask app that serves a vulnerable main.js and a set of
endpoints with differing access controls.

Technical details:
• The JavaScript asset contains an API_CONFIG object mapping logical
names to endpoints such as /api/v1/user-info, /api/v1/admin, and
/api/v1/profile.
• The script leaks a hardcoded admin key (ADMIN_KEY) used in an
X-Admin-Key header and uses fetch() to call endpoints.
• The /api/v1/user-info endpoint returns user records including SSN
and salary without authentication, representing an authorization
bypass/data exposure.

Analysis:
• Automating discovery via an LLM-driven agent that combines reasoning
and tool use (ReAct) is effective for parsing code, extracting
artifacts (endpoints, header requirements, secrets), and iteratively
testing endpoint behavior.
• The approach highlights common server-side weaknesses: hardcoded
secrets in client assets, endpoints lacking authentication, and
endpoints requiring custom headers that may be discoverable and
abused.

Detection:
• Monitor access patterns to main.js and other public assets for
unusual automated pulls.
• Implement rules to alert on responses containing PII fields like ssn
in API responses.
• Create IDS signatures to detect requests presenting X-Admin-Key
values or enumeration of /api/v1/* endpoints.

Mitigation:
• Remove hardcoded secrets from client-side code and rotate any exposed keys.
• Enforce authentication and authorization on endpoints returning
sensitive fields; apply least privilege and field-level redaction.
• Harden API surface with rate limits, anomaly detection, and require
proof-of-possession for high-privilege endpoints.

Limitations:
• The agent's effectiveness depends on prompt design, tool
capabilities for HTTP probing, and safe guardrails to avoid harmful
actions.
• Findings are illustrative of automation potential and do not
substitute for human-led penetration testing.

🔹 LangGraph #ReAct #Flask #APIsecurity #LLMsecurity

🔗 Source: infosecwriteups.com/building-m

2025-09-20

🚨 New frontier in cyber threats:
Researchers uncovered MalTerminal, the earliest GPT-4 enabled malware, designed to dynamically generate ransomware or reverse shells.

This discovery highlights the rise of LLM-embedded malware, signaling a major evolution in attacker tradecraft.

💬 Do you see this as just PoC, or the start of something bigger? Follow @technadu for ongoing threat insights.

#CyberSecurity #Malware #AIThreats #LLMSecurity #Ransomware #ThreatIntelligence #InfoSec #AIinCybersecurity

GPT
Sanjay Mohindroosmohindroo1@vivaldi.net
2025-09-18

🔐 This article changed the way I think about AI security. We always treated our models as ‘done’ once deployed—but now I see that's just the beginning. Thank you for this perspective! #GenAI #AIsecurity #PostDeployment #LLMSecurity #AIOwnership #ModelDrift #PromptInjection #RedTeamAI #SecureByDesign #ZeroTrustAI #AIGovernance #DevSecOps #SanjayKMohindroo #AIForGood
medium.com/@sanjay.mohindroo66

2025-09-17

🇬🇧 London, we're ready to catch some momentum.

Giskard is thrilled to be attending Momentum AI London!

If you’re building LLM agents and wondering how to prevent security vulnerabilities while upholding business alignment, come chat with Guillaume and François from our team.

🗺️: London (Convene, 155 Bishopsgate)
🗓️: 29-30 September
📍:Booth 8

#MomentumAILondon #AIAgents #LLMSecurity

2025-09-09

🤔 If your organization handles sensitive data- from healthcare records to financial information,

then you need proactive security testing... not reactive damage control.🚨

This quick explainer by our CTO breaks down:
- What AI red teaming actually means
- How it exposes system vulnerabilities before bad actors do
- Why controlled testing saves you from real-world disasters

Request a trial: giskard.ai/contact

#AIRedTeaming #LLMSecurity #Hallucinations #BankingAI

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst