What's Changed
This release introduces BotThinking events, allowing you to apply guardrails to an LLM's reasoning traces. We added support for embedding providers: Azure OpenAI, Cohere, and Google. We also added an integration with Cisco AI Defense. For performance, in-memory caching can be used for Nemoguard content-safety, topic-control, and jailbreak models to reduce latency. Other additions include automatic provider inference for LangChain LLMs and custom HTTP header support for ChatNVIDIA.
Reasoning trace extraction is refactored to use LangChain's additional_kwargs. Traces are no longer prepended to the bot's message but are now in a separate reasoning field. Please update any custom parsing logic. Additionally, stream_async will now correctly raise an error if output rail streaming is disabled.
We've fixed several stability bugs, especially for parallel execution. This includes ensuring the stop flag is now correctly set for policy violations and exceptions in parallel mode. We also added config validation at creation time for content-safety and topic-control rails. A fallback for legacy <think> tag extraction and a capitalization fix for Snowflake are also included.
We removed support for Python 3.9, ahead of its EOL in October 2025. We invested in code quality by adding type-annotations and pre-commit checks across a majority of the codebase.
๐ Features
- (bot-thinking) Implement BotThinking events to process reasoning traces in Guardrails (#1431), (#1432), (#1434).
- (embeddings) Add Azure OpenAI embedding provider (#702).
- (embeddings) Add Cohere embedding integration (#1305).
- (embeddings) Add Google embedding integration (#1304).
- (library) Add Cisco AI Defense integration (#1433).
- (cache) Add in-memory LFU caches for content-safety, topic-control, and jailbreak detection models (#1436), (#1456), (#1457), (#1458).
- (llm) Add automatic provider inference for LangChain LLMs (#1460).
- (llm) Add custom HTTP headers support to ChatNVIDIA provider (#1461).
๐ Bug Fixes
- (config) Validate content safety and topic control configs at creation time (#1450).
- (jailbreak) Capitalization of
Snowflakein use ofsnowflake-arctic-embed-m-longname. (#1464). - (runtime) Ensure stop flag is set for policy violations in parallel rails (#1467).
- (llm) [breaking] Extract reasoning traces to separate field instead of prepending (#1468).
- (streaming) [breaking] Raise error when stream_async used with disabled output rails streaming (#1470).
- (llm) Add fallback extraction for reasoning traces from tags (#1474).
- (runtime) Set stop flag for exception-based rails in parallel mode (#1487).
๐ Refactor
- [breaking] Replace reasoning trace extraction with LangChain additional_kwargs (#1427)
๐ Documentation
- (examples) Add Nemoguard in-memory cache configuration example (#1459), (#1480).
- Add guide for bot reasoning guardrails (#1479).
- Update LLM reasoning traces configuration (#1483).
๐งช Testing
- Add mock embedding provider tests (#1446)
- (cli) Add comprehensive CLI test suite and reorganize files (#1339)
- Skip FastEmbed tests when not in live mode (#1462)
- Fix flaky stats logging interval timing test (#1463)
- Restore test that was skipped due to Colang 2.0 serialization issue (#1449)
โ๏ธ Miscellaneous Tasks
- Resolve PyPI publish workflow trigger and reliability issues (#1443)
- Fix sparse checkout for publish pypi workflow (#1444)
- Drop Python 3.9 support ahead of October 2025 EOL (#1426)
- (types) Add type-annotations and pre-commit checks for tracing (#1388), logging (#1395), kb (#1385), cli (#1380), embeddings (#1383), server (#1397), and llm (#1394) code.
- Update insert licenser pe-commit-hooks to use current year (#1452).
- (library) Remove unused vllm requirements.txt files (#1466).