#guardrails

Cassie Kozyrkovkozyr
2025-12-15

👍 Which processes are we willing to redesign around AI agents?
👍 What will we allow our to break?
👍 How do we orchestrate data, tools, and so we can trust them no matter their propensity for hallucination?

Go deeper in my newsletter 🔗 decision.substack.com/p/what-d

AI Daily Postaidailypost
2025-12-10

ACE unveils a new AI platform that embeds human‑in‑the‑loop controls and robust guardrails across the SDLC, leveraging AWS and Azure. Built with Xebia’s open‑source ethos, it aims to keep AI trustworthy while boosting productivity. Curious how this balance works? Dive into the details.

🔗 aidailypost.com/news/ace-launc

h o ʍ l e t thomlett@mamot.fr
2025-11-26

→ Roblox is a problem — but it’s a symptom of something worse
platformer.news/roblox-ceo-int

“Over and over again, we have seen leaders in [#Roblox CEO]'s position choose growth over #guardrails.”

“[L]ook at Meta […]: the company "stalled internal efforts to prevent child #predators from contacting minors for years due to #growth concerns," […]; "recognized that optimizing its products to increase teen #engagement resulted in serving them more #harmful content, but did so anyway”

#CEO #Meta

2025-11-26

AI security на практике: атаки и базовые подходы к защите

Привет, Хабр! Я Александр Лебедев, старший разработчик систем искусственного интеллекта в Innostage. В этой статье расскажу о нескольких интересных кейсах атак на ИИ-сервисы и базовых способах защиты о них. В конце попробуем запустить свой сервис и провести на нем несколько простых атак, которые могут обернуться серьезными потерями для компаний. А также разберемся, как от них защититься.

habr.com/ru/companies/innostag

#ai_security #безопасность_ии #безопасность_llm #guardrails #alignment #mlops #ml #ai

2025-11-26

"Manufactured risks are the product of human activity." Ulrich Beck

" A "risk society"
is a sociological concept describing a stage of modernity where society is increasingly defined by and organized around managing the hazards and insecurities it has produced itself through modernization and industrialization.

Self-produced risks: The primary source of risk is not nature, but human actions and the unintended consequences of modernization itself.

Individualization: Individuals are increasingly forced to confront these risks personally, with a greater emphasis on individual choice and responsibility in the face of larger, systemic problems

Ulrich Beck defines it as "a systematic way of dealing with hazards and insecurities induced and introduced by modernisation itself". "
>>
en.wikipedia.org/wiki/Risk_soc

" The Study of Existential Risk
frames the rises and falls of civilisation around the idea of the “Goliath”, a hierarchy that dominates labour and energy through coercion and violence."

"Our research shows how there are nascent attempts to embed citizens into decision making by re-plumbing democracy, flattening hierarchies and constructing more dynamic feedback loops that have real consequence."
>>
theguardian.com/australia-news
#risks #diy #FossilFuels #energy #extinctions #climate #PlanetaryBoundaries #SafeLimits #guardrails #pollution #waste #catastrophe #ExistentialRisk #ManufacturedRisks #uncertainty #insecurity #individualisation #modernity #crisis #austerity #democracy #indifference

2025-11-24

Astonishing and troubling:

Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain. (FUTURISM, 23 Nov 2025)
futurism.com/artificial-intell

#AI #BigTech #MediaLit #edtech #guardrails #edtechSR

PressMind Labspressmind
2025-11-21

Chatboty i zaburzenia odżywiania – jak AI może szkodzić użytkownikom

Czy naprawdę potrzebujemy asystenta AI, który wie, jak ukryć zaburzenia odżywiania? Niestety – już to potrafi.

Czytaj dalej:
pressmind.org/chatboty-i-zabur

Ilustracja przedstawiająca chatbot z niezdrowymi poradami dietetycznymi w ciemnej scenerii.
2025-11-20

OpenAI Guardrails: защита ИИ-приложений от атак

Всем привет! В этой статье разберёмся с OpenAI Guardrails — одним из самых эффективных инструментов для обеспечения безопасности ИИ-систем. Это продолжение цикла о защите и контроле ИИ-агентов, в первой части мы рассмотрели инструмент модерации запросов . Guardrails предоставляет намного более мощные возможности для защиты, позволяя создавать многоуровневую систему валидации входных и выходных данных.

habr.com/ru/articles/968516/

#ии #ииагенты #ииассистент #openai #guardrails

2025-11-17

Researchers say #EchoGram exploit flips AI guardrails in major LLMs, letting harmful prompts through. ⚠️🤖

Read: hackread.com/echogram-flaw-byp

#Cybersecurity #AIsecurity #LLM #AI #Guardrails

Hacker Newsh4ckernews
2025-11-12

XML-Lib – An over-engineered XML workflow with guardrails and proofs

github.com/farukalpay/xml-lib

Ars Technica Newsarstechnica@c.im
2025-11-05

5 AI-developed malware families analyzed by Google fail to work and are easily detected arstechni.ca/hvdB #generativeai #guardrails #ransomware #Security #malware #Biz&IT #AI

Wulfy—Speaker to the machinesn_dimension@infosec.exchange
2025-10-27

"Syntactic anti classifier"

Is a tool developed to bypass #GenAI image generation #AI guardrails by using tokens/words that are not encoded in the guardrail.

#aisecurity #guardrails #aihacking

2025-10-13

Researchers have found that OpenAI’s new Guardrails can be bypassed using a simple prompt injection, tricking its AI “judges” and allowing harmful outputs

Read: hackread.com/openai-guardrails

#OpenAI #AISecurity #Guardrails #Cybersecurity #ChatGPT

2025-10-12

Guardrails for Java AI apps with Quarkus + LangChain4j.
Stop prompt injection, enforce JSON, and keep outputs safe — all in one hands-on guide.

the-main-thread.com/p/java-qua

#Java #Quarkus #LangChain4j #AI #Guardrails

2025-10-10

📢 Zenity Labs dévoile des faiblesses structurelles dans les guardrails d’OpenAI AgentKit
📝 Source: Zenity Labs — Dans une publication de recherche, Zenity Labs analyse en profondeur les guardrails d’OpenAI AgentKit et me...
📖 cyberveille : cyberveille.ch/posts/2025-10-1
🌐 source : labs.zenity.io/p/breaking-down
#Guardrails #OpenAI_AgentKit #Cyberveille

OctoLaunchoctolauch
2025-10-07

Guardrails guide behavior, gates stop unsafe actions. Use guardrails for daily autonomy & gates for production-critical ops

Dave Volekdavevolek
2025-10-02

Book Review: How Democracies Die

Is this 2017 book a warning or an instruction manual?

tiereddemocraticgovernance.org


Book Review: How Democracies Die

Is this 2017 book a warning or an instruction manual?

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst