What's Changed
This release introduces IORails, a new optimized Input/Output rail engine that supports parallel execution of NemoGuard rails (content-safety, topic-safety, and jailbreak detection) with logging and unique request IDs. A new check_async method in LLMRails enables standalone input/output rails validation without requiring a full conversation flow. The guardrails server is now fully OpenAI-compatible (including a new v1/models endpoint), and a new GuardrailsMiddleware enables seamless integration with LangChain agents. New community integrations include PolicyAI for content moderation, CrowdStrike AIDR, and regex-based detection rails. Embedding index initialization is now lazy, improving startup performance. Streaming internals have been cleaned up along with a major documentation revamp.
🚀 Features
- (library) Update Trend Micro Vision One AI Guard official endpoint (#1546)
- (llmrails) Add check_async method for input/output rails validation (#1605)
- (server) Make guardrails server OpenAI compatible (#1340)
- (integration) Add GuardrailsMiddleware for LangChain agent (#1606)
- (library) Update Fiddler Guardrails API to match new specification (#1619)
- (library) Add CrowdStrike AIDR community integration (#1601)
- (iorails) Introduce IORails optimized Input/Output rail engine. Supports non-streaming parallel nemoguard input/output rails (content-safety, topic-safety, jailbreak detection) (#1638, #1649, #1654, #1656, #1658, #1660, #1661, #1674)
- (server) Add OpenAI compatible v1/models endpoint (#1637)
- (benchmark) Add Locust stress-test (#1629)
- (jailbreak) Validate Jailbreak Detection config at create-time (#1675)
- (library) Add PolicyAI Integration for Content Moderation (#1576)
🐛 Bug Fixes
- (server) Make openai an optional server-only dependency (#1623)
- (actions) Rename generate_next_step to generate_next_steps for task-specific LLM support (#1603)
- (library) Add
validalias to action results in GuardrailsAI integration (#1578) (#1611) - (llm) Filter stop parameter for OpenAI reasoning models (#1653)
- (logging) Show cache hits in Stats log and fix duplicate metadata restore (#1666)
- (cache) Make cache stats log visible in verbose mode (#1667)
- (library) Use bot refuse to respond in gliner PII detection flows (#1671)
- (streaming) Handle None stop tokens in streaming handler (#1685)
- (streaming) Handle dict chunks in RollingBuffer.format_chunks (#1687)
- (middleware) Handle MODIFIED status in GuardrailsMiddleware instead of silently dropping it (#1714)
🚜 Refactor
- (streaming) Remove LangChain callback dependencies from StreamingHandler (#1547)
- (streaming) Remove ChatNVIDIA streaming patch (#1607)
- (streaming) [breaking] Remove stream_usage and fix streaming metadata capture (#1624)
⚡ Performance
- (actions) Lazy initialization of embedding indexes (#1572)
⚙️ Miscellaneous Tasks
- Update Pangea User-Agent repo URL (#1595) (#1610)
- (jailbreak) Update dependencies for jailbreak detection docker container. (#1596)
- Remove multi_kb example (#1673)
- (iorails) Increase work queue concurrency and depth (#1674)
- (docs) Remove AI Virtual Assistant Blueprint notebook (#1682)
- Update dependencies ahead of v0.21 release (#1617)
New Contributors
- @trend-anurag-das made their first contribution in #1546
- @christinaexyou made their first contribution in #1340
- @kevinfiddlerai made their first contribution in #1619
- @ichbinlucaskim made their first contribution in #1610
- @vitchor made their first contribution in #1576
Full Changelog: v0.20.0...v0.21.0