NVIDIA-NeMo/Guardrails v0.21.0 on GitHub

What's Changed

This release introduces IORails, a new optimized Input/Output rail engine that supports parallel execution of NemoGuard rails (content-safety, topic-safety, and jailbreak detection) with logging and unique request IDs. A new check_async method in LLMRails enables standalone input/output rails validation without requiring a full conversation flow. The guardrails server is now fully OpenAI-compatible (including a new v1/models endpoint), and a new GuardrailsMiddleware enables seamless integration with LangChain agents. New community integrations include PolicyAI for content moderation, CrowdStrike AIDR, and regex-based detection rails. Embedding index initialization is now lazy, improving startup performance. Streaming internals have been cleaned up along with a major documentation revamp.

🚀 Features

(library) Update Trend Micro Vision One AI Guard official endpoint (#1546)
(llmrails) Add check_async method for input/output rails validation (#1605)
(server) Make guardrails server OpenAI compatible (#1340)
(integration) Add GuardrailsMiddleware for LangChain agent (#1606)
(library) Update Fiddler Guardrails API to match new specification (#1619)
(library) Add CrowdStrike AIDR community integration (#1601)
(iorails) Introduce IORails optimized Input/Output rail engine. Supports non-streaming parallel nemoguard input/output rails (content-safety, topic-safety, jailbreak detection) (#1638, #1649, #1654, #1656, #1658, #1660, #1661, #1674)
(server) Add OpenAI compatible v1/models endpoint (#1637)
(benchmark) Add Locust stress-test (#1629)
(jailbreak) Validate Jailbreak Detection config at create-time (#1675)
(library) Add PolicyAI Integration for Content Moderation (#1576)

🐛 Bug Fixes

(server) Make openai an optional server-only dependency (#1623)
(actions) Rename generate_next_step to generate_next_steps for task-specific LLM support (#1603)
(library) Add valid alias to action results in GuardrailsAI integration (#1578) (#1611)
(llm) Filter stop parameter for OpenAI reasoning models (#1653)
(logging) Show cache hits in Stats log and fix duplicate metadata restore (#1666)
(cache) Make cache stats log visible in verbose mode (#1667)
(library) Use bot refuse to respond in gliner PII detection flows (#1671)
(streaming) Handle None stop tokens in streaming handler (#1685)
(streaming) Handle dict chunks in RollingBuffer.format_chunks (#1687)
(middleware) Handle MODIFIED status in GuardrailsMiddleware instead of silently dropping it (#1714)

🚜 Refactor

(streaming) Remove LangChain callback dependencies from StreamingHandler (#1547)
(streaming) Remove ChatNVIDIA streaming patch (#1607)
(streaming) [breaking] Remove stream_usage and fix streaming metadata capture (#1624)

⚡ Performance

(actions) Lazy initialization of embedding indexes (#1572)

⚙️ Miscellaneous Tasks

Update Pangea User-Agent repo URL (#1595) (#1610)
(jailbreak) Update dependencies for jailbreak detection docker container. (#1596)
Remove multi_kb example (#1673)
(iorails) Increase work queue concurrency and depth (#1674)
(docs) Remove AI Virtual Assistant Blueprint notebook (#1682)
Update dependencies ahead of v0.21 release (#1617)

New Contributors

@trend-anurag-das made their first contribution in #1546
@christinaexyou made their first contribution in #1340
@kevinfiddlerai made their first contribution in #1619
@ichbinlucaskim made their first contribution in #1610
@vitchor made their first contribution in #1576

Full Changelog: v0.20.0...v0.21.0