deepset-ai/haystack v2.30.0-rc1 on GitHub

⭐️ Highlights

🐍 Syntax-aware Python code splitting with `PythonCodeSplitter`

The new PythonCodeSplitter is a syntax-aware splitter for Python source files, built for code-RAG and code-search pipelines where naive line-based splitting tends to cut through functions and lose structural context. It parses sources with the ast module and greedily merges units, such as module docstring, import blocks, top-level functions, class headers, methods, and nested classes, into chunks of roughly max_effective_lines, keeping whole functions and methods together. For functions that exceed oversized_factor * max_effective_lines, it falls back to a line-based secondary split with overlap.

Two options make the resulting chunks more useful downstream: strip_docstrings=True moves docstrings into chunk metadata, and preserve_class_definition=True prepends the enclosing class signature to chunks whose members live in a later chunk. Each chunk also carries rich metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.

from haystack.components.preprocessors import PythonCodeSplitter

splitter = PythonCodeSplitter(
    max_effective_lines=80,
    strip_docstrings=True,
    preserve_class_definition=True,
)
result = splitter.run(documents=[doc])

💬 Pass a plain string to any `ChatGenerator`

All Haystack ChatGenerator components now accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically wrapped in a ChatMessage with the user role. This makes switching from a Generator to a ChatGenerator a one-line change. The change applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator, and will soon be rolled out to the ChatGenerators in Haystack Core Integrations.

from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)

⬆️ Upgrade Notes

DALLEImageGenerator has been updated to account for OpenAI's retirement of the DALL-E models. The default model is now gpt-image-2 (previously dall-e-3). To migrate:
- Update model value: besides gpt-image-2, gpt-image-1 and gpt-image-1-mini are also supported.
- Update quality value: the new accepted values are auto, high, medium, or low (previously standard or hd).
- Update size value: the new accepted values are 1024x1024, 1024x1536, 1536x1024, or auto. gpt-image-2 also supports arbitrary sizes.
- The response_format parameter is now ignored. The component always returns base64-encoded JSON.
LLM.run and LLM.run_async no longer accept messages and streaming_callback as positional arguments — they must now be passed as keyword arguments. Update any direct calls accordingly:
```
# Before
llm.run([message], my_callback)

# After
llm.run(messages=[message], streaming_callback=my_callback)
```

🚀 New Features

Introduced the PythonCodeSplitter component, a syntax-aware splitter for Python source files:
- Parses sources with the ast module and merges units (module docstring, import blocks, top-level functions, class headers, methods, nested classes, and remaining statements) greedily into chunks of roughly max_effective_lines.
- Keeps whole functions and methods together; falls back to a line-based secondary split (using DocumentSplitter) with overlap only for functions whose effective length exceeds oversized_factor * max_effective_lines.
- Optionally strips docstrings into chunk metadata via strip_docstrings=True, and prepends the enclosing class signature to chunks whose members live in a later chunk via preserve_class_definition=True.
- Emits per-chunk metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.
All Haystack ChatGenerator components now also accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically converted into a list containing a ChatMessage with the user role. This is done to simplify switching from Generators to ChatGenerators; Generators might be removed in Haystack 3.0.
This applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator.
The same change will be soon applied to ChatGenerators available in Haystack Core Integrations.
Example:
```
from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)
```

⚡️ Enhancement Notes

Added run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever. These components now execute natively as coroutines in AsyncPipeline, delegating to each wrapped component's run_async when available and falling back to a thread executor otherwise.
Fix grammar in the AzureOpenAIGenerator and AzureOpenAIChatGenerator docstring code examples ("<this a model name..." → "<this is a model name...") so that copy-pasted snippets read correctly.
LLM now supports two usage modes:
1. Template-variable mode: provide a user_prompt with Jinja2 variables (e.g. {{ query }}). Those variables become pipeline inputs and messages is optional. The rendered user_prompt is always appended after any messages provided at runtime.
2. Pass-through mode: omit user_prompt or provide one with no template variables. messages becomes a required input, allowing a fully-constructed list of ChatMessages to be passed from upstream.
Update ToolsType to improve type checking for the tools parameter. Any class that inherits from either Tool or Toolset is now accepted in any sequence (list, tuple, etc).
Pipeline.draw() and Pipeline.show() now validate the Mermaid server response before writing it to disk. The response body is checked against the expected output format (PNG, JPEG, WebP, SVG, or PDF) via its magic-byte signature, and the Content-Type header is checked as well. If the response is empty or does not match the requested format, a PipelineDrawingError is raised and no file is written. This prevents a misconfigured or untrusted server_url from causing arbitrary content (for example an HTML error page) to be saved verbatim to the output path.

🐛 Bug Fixes

Prevent Document.from_dict() from mutating the input dictionary during deserialization.
Prevent DocumentLanguageClassifier from crashing when Document.content=None by marking them as unmatched and logging a warning.
Fixed a bug where Agent would not exit when the model emitted multiple tool calls in a single turn and the configured exit-condition tool was not the first one in the list. Previously, only the first tool call in each assistant message was checked against exit_conditions, so a reply like [search, finish] (with exit_conditions=["finish"]) would silently fail to stop the loop and keep iterating until max_agent_steps was reached. Since parallel tool calls are now the norm for frontier models, this could quietly turn a single successful turn into dozens of wasted LLM calls. The Agent now inspects every tool call in the message, so the exit condition is honored regardless of ordering.
Fix AnswerBuilder.run() mutating the meta dict of input Document objects. source_index (and referenced when reference_pattern is set) are now only added to the document copies inside GeneratedAnswer.documents, not to the originals.
Fixed DocumentJoiner in concatenate mode so that documents with a score of exactly 0.0 are no longer treated as unscored during deduplication. Previously a truthiness check coerced score=0.0 to -inf, which could cause a worse, negatively-scored duplicate to be kept instead of the 0.0-scored document. The merge mode was updated to the same explicit is not None check for consistency; its observable behavior is unchanged.
Fixed in-place mutation of ExtractedAnswer.meta in ExtractiveReader._add_answer_page_number when the answer's meta was None. Now uses dataclasses.replace to avoid triggering the dataclass mutation warning.
Fixed ExtractiveReader raising ValueError when the number of valid answer spans for a sequence was smaller than answers_per_seq (for example with short documents or when answers_per_seq exceeded the number of upper-triangular, non-masked (start, end) token pairs). _postprocess now filters the per-sequence probabilities by the same validity mask it already applied to the start/end token indices, so the three structures always have matching lengths.
HierarchicalDocumentSplitter no longer mutates the metadata of the input Document. _add_meta_data now returns a new Document with a copied meta dict via dataclasses.replace instead of writing __block_size, __parent_id, __children_ids and __level onto the caller's Document.
Fixed a bug in LLMMetadataExtractor.run_async where the asyncio.Semaphore intended to bound concurrent LLM calls to max_workers was acquired once around the outer gather(...) call instead of inside each task. As a result, max_workers had no effect in run_async and all LLM requests for a batch were issued simultaneously. The semaphore is now acquired per task, so max_workers correctly caps in-flight requests.
expand_page_range() now raises a ValueError: too many values to unpack when a page range string contained more than one hyphen (e.g. "10-20-30"). The parser now validates the format and raises a clear ValueError with an explanatory message for invalid inputs.
LLMMetadataExtractor now raises a clear ValueError when the prompt contains no template variables. Previously this case raised an unhelpful IndexError: list index out of range. The error message now consistently explains that the prompt must contain exactly one variable called document.
Fixed HuggingFaceAPIDocumentEmbedder serialization to preserve the configured concurrency_limit.
Fix TransformersZeroShotDocumentClassifier.to_dict to include the classification_field and multi_label init parameters, which were previously dropped on serialization and reset to defaults after from_dict.

💙 Big thank you to everyone who contributed to this release!

@Aarkin7, @anakin87, @bogdankostic, @davidsbatista, @etairl, @HamidOna, @julian-risch, @kota-wilson, @maxdswain, @MechaCritter, @pragnyanramtha, @rautaditya2606, @sachinn854, @sjrl