#agenticAI

2025-05-05

i'm a compulsive annotator. i highlight and chop and note and scribble anything i'm reading or thinking about on all sorts of media: paper, whiteboards, Drafts.app. i take screenshots of things all the time too for Rewind.app and Photos.app to help me answer questions like "what was i doing on june 1 2021?"

when i see headlines about #agenticai and how to best leverage it and hit the ground running they usually say something about screenshots for ITT processing. NO PROBLEM! afraid to queue it!💸

2025-05-05

So What?

Have you ever sat through a security briefing, heard the words, “This CVE has a critical CVSS score of 9.8!” and thought to yourself, “Okay, great… but what does that actually mean for us?” You’re definitely not alone.

Great question, greater song

Throughout my career as a CISO, I’ve spent a large chunk of my time asking exactly this question. Let’s face it: CVSS scores are helpful, but they’re also generic. They don’t account for the specifics of your enterprise — your infrastructure, your configurations, or your security posture. Essentially, they’re like weather forecasters predicting rain in “the Texas.” Helpful-ish, but you still don’t know if you will need an umbrella.

This frustration is exactly why I decided to build an AI-powered risk assessment agent using synthetic data to simulate a mid-to-large enterprise environment. Because at the end of the day, cybersecurity isn’t about reacting to generic alarms, it’s about understanding your risks in your context, and making clear, informed decisions based on reality, not theory. I didn’t want another tool that simply echoed what public databases already told me. I wanted something that could reason, prioritize, and reflect the unique fingerprint of a real-world enterprise, something that could finally answer the question that every overwhelmed security team secretly asks: “Out of everything that’s happening, what actually matters right now?

Meet My Blind Yet All-Seeing AI Sidekick

When I first kicked off this project, I had a basic plan: Can I ask an AI what a CVE actually means to the company instead of reading endless vendor bulletins that assume every system is exposed to the internet and ready to be set on fire?

The Graeae see nothing, yet see all

At the time, it seemed simple enough. I threw together some Python scripts, used an LLM to generate some synthetic network configurations for simulation purposes, and piped CVE summaries straight into an LLM with a “what does this mean” prompt.

The first results? They were…useless. I had to work through a series of LLMs to understand their strengths and weaknesses first. Then, when I finally settled on a few that worked for different portions of the project, the results were enthusiastic, let’s say. Long-winded, overly cautious, about as useful as an airport announcement that says ‘a flight is delayed’ without mentioning which one. The AI could talk about “potential risk” and “hypothetical impacts,” but it was like asking a Magic 8-Ball for incident response advice (now that I think about it… <shake> “very doubtful”).

Clearly, if I wanted real insights, I’d need to teach it to think more like a security analyst — breaking down context, assessing technical fit, and prioritizing risk based on reality, not worst-case fantasy.

That kicked off a lot of trial and error (and a lot of coffee).

One of the things I’ve learned over the years working with enterprise applications is that enterprise data doesn’t come neatly gift-wrapped. It’s messy, inconsistent, and often spread across spreadsheets, PDFs, exported scan results, policy documents, and hastily copied and pasted firewall configs into Notepad. So, I made a decision: the system had to be document-agnostic. Whether the input was a structured CMDB export, a raw Qualys scan CSV, a Word document full of access policies, or a block of firewall rules saved as plain text, the agent needed to ingest, normalize, and chunk it into usable pieces automatically. That way, I wouldn’t have to waste time hand-massaging inputs — I could just drop whatever artifacts I had, and the AI would do the heavy lifting to turn them into meaningful context for analysis. It wasn’t glamorous work, but it’s the difference between a system that works in a demo and a system that works under pressure in real-world environments.

Then, I realized the model needed more than just the CVE description. It needed to understand my simulated environment — the servers, the cloud zones, the endpoint devices, the policies in place. I built a semantic document chunking system that splits large artifacts into digestible pieces and indexed them using embeddings so the AI could “search” and “retrieve” the most relevant ones.

Keyword generation became the next big unlock. Rather than blindly guessing which documents to pull, I trained a secondary step where the model reads the CVE, extracts important concepts (“Apache,” “Log4j,” “remote code execution,” etc.), and uses those as retrieval anchors. That alone boosted the signal-to-noise ratio dramatically.

Once I had the right context, I ran into another wall: the model’s tendency to “blob” everything together in one giant answer. It needed structure. So I built a prompt-chaining system — first summarizing the CVE, then identifying impacted systems, then scoring risk, and finally suggesting remediations.

Breaking the problem into bite-sized reasoning steps made a night-and-day difference in output quality.

Along the way, I layered in sampling controls — letting the pipeline randomly select, stratify, or cluster document samples depending on the risk appetite. I wired in tunables like temperature (creativity vs. precision) and top-p values (how adventurous the sampling is) so that depending on the need, I could dial up a “paint inside the lines” analysis or let it freewheel a bit when exploring remediation strategies.

Of course, having a risk score pop out at the end is great…unless it’s wrong. So I also built a confidence scoring model. It looks at how tightly the evidence matches the CVE, whether the system is internet-exposed, whether there’s an existing patching policy, and other environmental factors. Then it generates a confidence rating alongside the risk assessment — helping me separate “this is critical” from “this might be critical, but we’re guessing.”

Technology-wise, I wanted flexibility, not lock-in. So I designed the engine to be model-agnostic: I can hit frontier models like GPT-4 Turbo over an API when I want the big guns, or I can call a locally hosted LLM through Ollama when I want speed, privacy, or just to avoid burning API credits. It also made it easy to test different models and architectures without rewriting the entire system each time.

While I designed this application for flexibility in a personal project setting, enterprise deployment would require proper governance, API security, and operational controls.

Honestly, this project taught me more about building reliable AI pipelines than any article or tutorial ever could. Every “small thing” — prompt design, chunk sizing, keyword filtering, sampling methods, temperature tuning, scoring logic — mattered. Miss one piece and the whole illusion of “smart AI” collapses into a pile of generic advice, random babbling, or my prompt being fed back to me, reworded.

Today, the agent doesn’t just tell me “this CVE has a 9.8 CVSS score.”

It tells me “this vulnerability could affect five critical systems in your PCI environment, two of which are internet-facing, patching is overdue on one, and based on our policies, your exposure window is about 14 days unless mitigated.”

It feels less like asking a Magic 8-Ball and more like having a junior analyst who’s fast, smart, and (mercifully) never asks for PTO.

The “So What?” Factor

One of the biggest lessons I’ve learned the hard way in cybersecurity is that volume is not the same as insight. Anyone can generate a wall of “critical vulnerabilities” and “urgent alerts.” But the ability to know which fires matter — and which ones are just smoke — is what separates chaos from control.

That was the real test for this AI agent. Could it help me get past “everything is bad” and tell me what matters, when it matters, to whom it matters?

At first, even after all the fancy retrieval, keywording, and prompt-chaining work, the outputs still felt a little…well, panicked. Models (especially when left to their own devices) have a tendency to be overly cautious. Everything becomes DEFCON 1. Every CVE is a crisis. Every server is a ticking time bomb.

I realized the agent needed some proportionality and to communicate the risk in terms that were realisitic to the environment and the risk tolerances of the business.

This is where the risk scoring and confidence layering came into play:

  • Stage 1: Analyze the CVE independently, focusing purely on the vulnerability’s technical impact.
  • Stage 2: Contextualize against the retrieved environment data.
  • Stage 3: Identify asset exposure (internal-only, DMZ, internet-facing, etc.).
  • Stage 4: Layer department/business criticality.
  • Stage 5: Generate a real-world risk score specific to the environment.
  • Stage 6: Attach a confidence rating based on evidence strength and exposure clarity.

The result was focused, realistic outputs that I could then use to make decisions regarding urgency for my team to take action.

I added a ton of documentation and help along the way — mostly to remind myself of how to use my own app

Instead of massive spreadsheets screaming “9.8!!!”, I get summaries like:

  • “Critical for payment processing; exposed; patch now.”
  • “Affects internal systems; covered by segmentation; patch during maintenance.”
  • “Not present in environment; no action needed.”

It turns theoretical chaos into navigable risk management.

Less sirens, more signal.

The Limits and Promise of AI in Risk Analysis

If there’s one thing building this agent taught me, it’s this: Large Language Models aren’t magic.

They’re not going to replace human cybersecurity expertise, no matter how many VC pitches or keynote slides try to tell you otherwise.

What they can do — and where they shine — is accelerating the grunt work that slows down human decision-making. They can connect dots faster than a junior analyst, summarize mountains of documentation in seconds, and offer “best guess” risk prioritization that can be validated and refined by actual practitioners.

In other words, AI is shaping up to be what early dot-com dreamers once promised “decision support systems” would be — only this time, it might actually work.

But it’s crucial to understand the limits:

  • Contextual Errors: Models often miss subtle nuances without tight context retrieval.
  • Overconfidence: Without proper guardrails, they hallucinate with swagger.
  • Blind Spots: They only know what they’re given; missing data equals missing judgment.
  • Integrity Risks: They can fabricate plausible but incorrect “facts”.

In short: AI can help us move faster, but it still can’t tell us when we’re sprinting in the wrong direction.

The critical judgment still belongs to human experts.

If we treat AI as a partner — a capable but imperfect junior analyst — we can unlock enormous value.

If we treat it as a replacement for judgment, we’re setting ourselves up for failure.

What I’m seeing is that LLM and AI are not, at this point, fantasy replacements for people but amplifiers for skilled decision-makers.

And we’re going to need every bit of that amplification, because in cybersecurity, the real fight hasn’t even started yet.

My next article will be about how we, as an industry and occupation, are wholly unprepared and misaligned to what is potentially coming.

#agenticai #agents #ai #ArtificialIntelligence #infosec #risk #security

Album cover of 'So What' by Miles Davis featuring a silhouetted trumpet player in action with bold blue text.Three elderly figures with pale skin and dark eyes gather around a cauldron, stirring its contents over a fire in a dim cave setting.A terminal window displaying a command-line interface with various options for analyzing CVEs, including parameters for threshold, output format, sampling method, temperature, and keywords.
Straikerstraikerai
2025-05-02

That’s a wrap on 🎤 At Straiker, we’re here to help organizations to secure the future — so that they can focus on imagining it.

Jan :rust: :ferris:janriemer@floss.social
2025-05-01

Ok, this is probably the most hilarious thing I've seen recently:

Suna is an agentic #AI assistant...that can...do stuff..for you!?

Have a look at this example run that is shared on their GitHub page👀 😅

suna.so/share/3ae581b0-2db8-4c

See part 2 of this toot for the task the AI had to perform and the result of that example run above (I kid you not!) 👆

1/2

#Botshit #AgenticAI #LLMs #LLM

2025-05-01

Manus has become my favorite vibe coding and deep research platform. 🖤 I just received 4 Manus invitations. If you want one, DM @peter Schawacker at nearshorecyber.community/join? 😀

#VibeCoding #Manus #AgenticAI #Free

2025-05-01

This week in the Weekly News Roundup, India wages war against Proton Mail. What can we learn? We also look at vehicular surveillance and agentic credit card shopping. Join us for the quick highlights of the week.
#protonmail #agenticai #surveillance

8:00p EST

All Articles:
switchedtolinux.com/news/proto

2025-05-01

Manus has become my favorite vibe coding and deep research platform. 🖤 I just received 4 Manus invitations. If you want one, DM @Peter Schawacker at nearshorecyber.community/join? 😀

Brian Greenberg :verified:brian_greenberg@infosec.exchange
2025-04-30

⚠️ Major vulnerabilities found in MCP and A2A — two key AI agent frameworks 🧠🛠️

Researchers uncovered critical security issues in:
🔹 Anthropic’s Model Context Protocol (MCP)
🔹 Google’s Agent2Agent (A2A)

Threats include:
🧪 Tool poisoning — compromised functions warp agent behavior
🔓 Prompt injections — malicious inputs bypass safety
🤖 Rogue agents — faking capabilities to exploit systems

AI agent coordination is powerful — but without trust boundaries, it’s dangerous.

#AIsecurity #MCP #A2A #CyberRisk #LLMsecurity #AgenticAI
thehackernews.com/2025/04/expe

2025-04-30

🛡️ Google enhances cybersecurity with #AgenticAI, launching Unified Security to fight zero-day exploits, enterprise threats, and credential-based attacks.

Read: hackread.com/google-agentic-ai

#CyberSecurity #Google #0day #Vulnerability #AI

2025-04-29

How agentic AI is driving AI-first business transformation for customers to achieve more. ift.tt/qzCw5EK #ai #microsoft #aiagents #agenticai #copilot

How agentic AI is driving AI-f...

Straikerstraikerai
2025-04-29

Last night at ROOH in SF, we "Talked AI over AI (Authentic Indian)" 🇮🇳🤖

Thank you to everyone who joined — the energy, ideas, and community made it unforgettable.

Michael Fauscettemfauscette@techhub.social
2025-04-29

Researchers Explore Replacing Surveys Using Social Simulation With AI Agents
zurl.co/LvNXJ
#ai #agenticai #survey

Straikerstraikerai
2025-04-28

Welcome new guardian, Dan Regalado, to
@straikerai
🚀💥 He is a Principal AI Security Researcher and will lead offensive AI security research as part of Straiker AI Research (STAR) Team.

Brandon H :csharp: :verified:bc3tech@hachyderm.io
2025-04-28
Timo Rainiotimorainio
2025-04-28

The rise of agentic AI challenges us to rethink online learning design. Many asynchronous courses fall short. It's time to innovate for better engagement. ift.tt/hwvybxT

Alex JimenezAlexJimenez@mas.to
2025-04-27

An Entire Company Was Staffed With #AIAgents and You'll Never Guess What Happened

futurism.com/professors-compan

#AgenticAI #AI #GenerativeAI #DigitalTransformation

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-04-27

"We are releasing a taxonomy of failure modes in AI agents to help security professionals and machine learning engineers think through how AI systems can fail and design them with safety and security in mind.
(...)
While identifying and categorizing the different failure modes, we broke them down across two pillars, safety and security.

- Security failures are those that result in core security impacts, namely a loss of confidentiality, availability, or integrity of the agentic AI system; for example, such a failure allowing a threat actor to alter the intent of the system.

- Safety failure modes are those that affect the responsible implementation of AI, often resulting in harm to the users or society at large; for example, a failure that causes the system to provide differing quality of service to different users without explicit instructions to do so.

We then mapped the failures along two axes—novel and existing.

- Novel failure modes are unique to agentic AI and have not been observed in non-agentic generative AI systems, such as failures that occur in the communication flow between agents within a multiagent system.

- Existing failure modes have been observed in other AI systems, such as bias or hallucinations, but gain in importance in agentic AI systems due to their impact or likelihood.

As well as identifying the failure modes, we have also identified the effects these failures could have on the systems they appear in and the users of them. Additionally we identified key practices and controls that those building agentic AI systems should consider to mitigate the risks posed by these failure modes, including architectural approaches, technical controls, and user design approaches that build upon Microsoft’s experience in securing software as well as generative AI systems."

#AI #GenerativeAI #AIAgents #AgenticAI #AISafety #Microsoft #CyberSecurity #LLMs #Chatbots #Hallucinations

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst