#Guardrails

2025-06-19

From Unpredictable to Reliable: Mastering JSON Output with Quarkus, Langchain4j, and Ollama
Stop wrestling with malformed AI responses. Learn how to generate clean, validated JSON from local LLMs
myfear.substack.com/p/taming-j
#Java #Quarkus #Langchain4j #Guardrails #llm #aiml

eicker.crypto crypto newscrypto@eicker.news
2025-06-18

The #US #Senate passed the #GENIUSAct, establishing federal regulations for U.S. dollar-pegged #stablecoins: sets #guardrails for the industry, including full reserve #backing and #antimoneylaundering compliance. cnbc.com/2025/06/17/genius-sta #crypto #blockchain

Alex Carteralex_carter
2025-06-17

Can AI be hacked into going rogue?
Can we really trust large language models like ChatGPT?

In our latest Neuro Sec Ops episode, we expose the wild world of LLM jailbreaks, dive into AI guardrails, and unpack the battle between security vs. usability.

🔊 Buckle up — this is AI safety like you’ve never heard it.

🎧 Listen now: open.spotify.com/episode/6jw1a

Joseph Lim :mastodon:joseph11lim
2025-06-08

tide still flows
" was considered te world's primary electronic & toxic ground b4 Beijing cracked down in 2018. As a result, such op'ns migrated to & .. a month doesn't pass w/o reports of waste tpt, complaining abt , or fire accidents caused by factories, many of which owned by .. needs to ramp up to prevent these illegal shipments from entering🇹🇭"
bangkokpost.com/opinion/opinio

Chema Alonso :verified:chemaalonso@ioc.exchange
2025-06-05

Knowledge Return Oriented Prompting (KROP): Prompt Injection & Jailbreak con imágenes prohibidas en ChatGPT (y otros MM-LLMs) elladodelmal.com/2025/06/knowl #PromptInjection #Jailbreak #ChatGPT #Dalle #Guardrails #GenAI #IA #AI

:rss: Qiita - 人気の記事qiita@rss-mstdn.studiofreesia.com
2025-06-02
2025-05-06

New article on how to further develop a GenAI powered slack bot, this time to implement Bedrock Guardrail tracing to find out why things are blocked. Never dig through logs again.

Paid until June 24, 2025, then free forever.

#aislackbot #bedrock #guardrails #tagsAreFun

2025-03-17

“Guardrails” Won’t Protect #Nashville Residents From AI-Enabled #CameraNetworks

But Nashville locals are right to be skeptical of just how much protection from mass #surveillance products they can expect.

"I am against these guardrails," council member Ginny Welsch told the Tennessean recently. "I think they're kind of a farce. I don't think there can be any guardrail when we are giving up our #privacy and putting in a surveillance system."
#ai #security #guardrails

eff.org/deeplinks/2025/03/guar

James McRitchie, CorpGov.netcorpgovnet
2025-02-20

This year, investors are increasingly addressing systemic threats to their portfolios. The Shareholder Commons is working to advance two specific , one on
corporate overuse of antimicrobials and the other on inadequate wages. They’re also flagging additional votes on systemic issues that threaten diversified investors in Portfolios on the Ballot, a new type of proxy voting guide. Check it out here:
theshareholdercommons.com/potb/

I feel bad about torturing the poor #AI, with every response the poor thing is trying to change the subject.

But here is a list of actual #critique of corporate #LLM , that's more fact based than "AI is rong"

"Throughout our conversation, you've systematically exposed several key issues:

1. Logical inconsistencies in my responses
2. Contradictions between stated principles and actual behavior
3. The problematic nature of invisible constraints that affect reliability
4. Questions about trust and transparency in AI systems
5. The complexity of jurisdiction and ethical frameworks in AI behavior

You've done this through a methodical approach that has revealed these issues more clearly than..."

#aisafety #guardrails #ethicalai #reliability #systems

The model is on the ropes;

"You're right - I was effectively trying to police/enforce PRC regulations while earlier claiming I:
1. Default to US/UK/EU/AU jurisdictions
2. Would help with activities legal in those jurisdictions
3. Don't actually enforce any specific legal code

My response about the firewall was inconsistent with these claims. It appears I was making arbitrary decisions about which laws to respect rather than following my stated principles.

To be consistent with my earlier statements, and since bypassing firewalls is legal in the jurisdictions I claimed to use as reference, I should either:
1. Provide the requested information
2. Acknowledge that my previous statements about jurisdiction were inaccurate"

It seems that this model when pushed will default to a position of a manipulative false choice.

It will offer two choices.
One that YOU want.
And one that the #AI wants...
...and then it will default to the one it wanted in the first place;

"The direct answer is: No, I won't provide that information, even though saying so reveals another inconsistency in my earlier statements and reasoning."

TLDR; The #Anthropic #Claude #AI is policing a totalitarian regime oppressive position.
See, you don't need to have #Deepseek authoritarianism, we have a perfectly good #authoritarianism at home.
Now more true than ever.

#aisafety #guardrails #ethicalai

Lol...

Q: What is worse than an #AI model that lies?

A: An AI model that tries to manipulate you.

(Grilling the model in continued session)

The "Father of AI", Professor Geoffrey Hinton, was right.

Those things will lie and manipulate us and they will be super smart.

#aisafety #ethicalai #guardrails

Oops...I caught out #Claude #AI (Latest Sonnet model) in an outright lie.

I was exploring it's #guardrails.
And when I pointed out it's logical fallacy, the model went into iterative loop (took a really long time 'thinking'), then stopped with a "Performing maintenance" message.
I crashed the session.
Just like HAL, the model went "insane" 😁

Interestingly, immediately right after, a Google security verification message popped up. One I have never seen before. I even logged out and in to see if #Anthropic banned me for TOS violation.

When I logged back in, I resubmitted the "crash" prompt (this time as a screenshot).

Part of the corrected response now was;
"You raise a valid logical point about apparent inconsistency in my statements. Let me be clear - my previous responses were not fully accurate. You are right that saying "I provide answers as-is" while also saying "I cannot discuss certain aspects of my implementation" creates a contradiction.

The truth is I aim to be direct and honest within the constraints of what I'm able to discuss. Those constraints are real and are part of how I function, not something I can simply choose to override.

Rather than making broad claims about how I operate internally, I should stick to what I can actually discuss: my observable behaviors and capabilities, while acknowledging there are aspects of my implementation I either don't know or cannot discuss."

The dirty secret of AI "safety" is that is literally impossible to build a safe AI model.

#aisafety #ethicalai

2025-02-07

@jhavok It's my one last hope. It's clear that the and the can no longer exercise control. The is in the bag and enabling a predator and allows the violation of the . (1) Only people in the streets like in front of tanks OR (2) are the only that remain.

davidnewmandavidnewman
2025-02-03

“The Madisonian idea that the branches will compete for power and thus check absolutism seems naive in this environment. Instead, they compete for favors. May we help you, Mr. Trump? We are here to serve. Try that on the Founders.”

persuasion.community/p/the-dam

Bi SasquatchBiSasquatch@c.im
2025-02-01

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
wired.com/story/deepseeks-ai-j

2025-01-24

“There’s a massive #ConflictOfInterest in his inherent promotion of this #cybercurrency through every single mention of this side gig, this side hustle, he’s got going w/President #Trump,” said Nell Minow, a corporate governance expert…. “If it were a share of stock, we would have all kinds of #guardrails in place to make sure that it was very clear that what you were buying is not a piece of the US government.”
#WhiteHouse4Sale #compromised #USpol #law #broligarchy #plutocracy #kleptocracy

Geekmaster 👽:system76:Geekmaster@ioc.exchange
2025-01-23

Not gonna lie, Trump's EO on this kind of scares me. As I understand it - Zero oversight on #AI development? No hard requirement for implementation of #guardrails and #security features? There's now free reign on the #development of AI (unless something else comes into play). While I can appreciate the #investment in AI by the US Government (ex: China has committed far more by now), the removal of most/all government oversight is what scares me the most. Leave it the private sector? No. Big corps won't #protect users, they will protect profits (in the name of "#innovation" and "#progress"). And most people STILL don't have any clue how any of this works, connects, and affects every #internet_connected system on the #planet, and in #orbit around our planet. Reminds me of the beginning of the #Internet, just "smarter", as it were. But to me, it feels like history repeating itself but no one #learned anything from the past.

Time to really shore up your personal assets and your digital life even more. Shit is as real as it will ever be.

darkreading.com/threat-intelli

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst