DSPy 3.3.0b1 Release Notes

DSPy 3.3.0b1 is a beta release including a new ReActV2 module, a new BaseLM System, updating to GEPA 0.1.1, and fewer dependencies framework wide.

Most existing DSPy programs should keep working without changes. Review the breaking changes if you depend on NumPy from the base dspy install, inspect GEPA result internals, or catch provider-specific LM exceptions directly.

We would really appreciate feedback on ReActV2 and the new LM system! Please try them out and let us know if you run into any issues.

Highlights

ReActV2 and Native Tool-Calling History - @isaacbmiller

dspy.ReActV2 is a new version of ReAct built around native tool calling. It is currently marked as experimental.

The signature now uses dspy.History, dspy.Tool, and dspy.ToolCalls(which can now optionally store dspy.ToolCallResults), rather than the custom next_tool_args and custom trajectory syntax. Using dspy.History also means that messages are now broken up into user/assistant/tool groups rather than one long user message with the trajectory.

This changes the execution model in a few concrete ways:

parallel_tool_calls support: DSPy preserves each call/result pair by ID. You can do this in native mode or in non-native mode
Multi-turn native tool call support: Prior tool calls and results can be replayed as assistant and tool messages instead of being flattened into prompt text.
Each turn lives in dspy.History as structured messages rather than one ever-growing trajectory string, so providers with prompt caching can reuse stable prefixes more effectively. We have seen up to 50% decreases in cost for some tasks when testing this internally.

ReActV2 converts callables to dspy.Tool, adds an internal submit tool for final outputs, handles unknown tools and tool exceptions, accepts serialized history input, and can force final submission when the model does not call submit.

PRs: #9823, #9824, #9825, #9835

Typed, Provider-Neutral LM Boundary - @MaximeRivest

DSPy is moving from an untyped LM boundary based on prompt, messages, and provider-shaped kwargs toward a typed, provider-neutral contract:

def forward(self, request: dspy.LMRequest) -> dspy.LMResponse:
    ...

The resulting API is a cleaner LM extension point:

LiteLLM can become an optional compatibility fallback in the planned 3.5+ path, instead of a required part of the core LM contract.
Custom LM authors can implement one typed LMRequest -> LMResponse path instead of guessing which OpenAI/LiteLLM-shaped inputs will arrive.
Custom LMs can translate between DSPy's typed objects and their own provider, local runtime, gateway, or inference stack.
Adapters can start to depend on DSPy's representation of messages, multimodal content, tool calls, reasoning, citations, usage, cache controls, metadata, and stream events.

Most users do not need to change anything in 3.3. Existing lm(...), modules, and programs keep their current behavior by default.

Try out the typed return path with dspy.context(experimental=True), and the public migration plan explains the staged transition for custom LM and adapter authors.

See the full plan here

PRs: #9786, #9802, #9828

Smaller Base Install - @isaacbmiller

We have been whittling away at dependencies!

The base install is lighter: numpy is now optional via dspy[numpy], and direct dependencies on asyncer, xxhash, and typeguard were removed in favor of standard-library paths.

Users who do not need NumPy-backed retrieval, embeddings, or optimizers get a smaller default install with fewer transitive dependencies. Users who need NumPy-backed features can install dspy[numpy].

PRs: #9659, #9733, #9734, #9735

BaseLM Runtime, Save/Load, Errors, and LiteLLM Decoupling - @MaximeRivest

BaseLM now owns shared runtime state and supports sanitized state serialization through dump_state() and load_state(). Serialized LM state excludes API keys, preserves legacy saved states, and requires explicit opt-in before importing trusted custom LM classes.

Saved programs with custom LMs are easier to reason about, LM copies isolate DSPy-owned mutable state, and callers can catch dspy.LMError or a narrower DSPy subclass instead of depending on provider-specific exception classes. LiteLLM imports are lazy, which keeps the core LM API less coupled to a specific provider bridge at import time.

PRs: #9752, #9820, #9821, #9826

Custom Objects for RLM Sandboxes - @kmad

dspy.RLM can now accept custom sandbox-serializable values through SandboxSerializable.

Users can pass richer objects, such as DataFrames, into the sandbox with explicit setup, serialization, assignment, and preview behavior instead of forcing everything through prompt text.

PRs: #9411

GEPA 0.1.1 Support - @BenMcH

DSPy now supports gepa[dspy]==0.1.1, including updated DspyGEPAResult behavior, tests, and docs.

Users can move to the current GEPA DSPy integration in this beta. The breaking changes below call out the result-shape changes for code that inspects detailed GEPA outputs.

PRs: #9673

Breaking Changes

`numpy` Is Now Optional

numpy is no longer installed with base dspy. Features that need NumPy now require the numpy extra:

pip install "dspy[numpy]"

Affected areas include embeddings, KNN/KNNFewShot, SIMBA, and other NumPy-backed optimizer or retrieval paths. (#9659 by @isaacbmiller)

GEPA Result Shapes Changed With `gepa[dspy]==0.1.1`

The upstream GEPA 0.1.1 API changed several result structures, and DspyGEPAResult now mirrors those shapes. Users who inspect optimized_program.detailed_results may need to update code:

DspyGEPAResult.candidates is now a list of compiled DSPy modules, not instruction dictionaries.
DspyGEPAResult.best_candidate now returns a compiled DSPy module.
val_subscores is now list[dict[Any, float]], keyed by validation instance id.
per_val_instance_best_candidates is now dict[Any, set[int]].
best_outputs_valset is now dict[Any, list[tuple[int, Prediction]]] when tracked.
highest_score_achieved_per_val_task now returns a dictionary keyed by validation instance id.

If you pass custom GEPA reflection templates directly, note that GEPA 0.1.1 renamed default placeholders from <curr_instructions> / <inputs_outputs_feedback> to <curr_param> / <side_info>. In dspy.GEPA, passing reflection_prompt_template through gepa_kwargs now raises a clear ValueError; use instruction_proposer for custom proposal behavior instead. (#9673 by @BenMcH)

LM Error Types Are Now DSPy-Normalized

LM failures are now mapped into DSPy exception classes. This should make LM error handling more consistent, but code that catches provider-specific or LiteLLM-specific errors directly may need to catch dspy.LMError or a narrower DSPy subclass. (#9826 by @MaximeRivest)

LM Runtime Changes

Added core typed LM objects for provider-neutral requests, responses, messages, parts, tool specs, reasoning config, cache config, usage, history entries, stream events, and stream assembly. (#9786 by @MaximeRivest)
Added OpenAI/LiteLLM compatibility conversion helpers so current provider-shaped calls can move through LMRequest and LMResponse internally. (#9802 by @MaximeRivest)
Routed adapter __call__ and acall through the normalized LM boundary while converting back to legacy parser inputs for compatibility. (#9802 by @MaximeRivest)
Current BaseLM calls still receive OpenAI/LiteLLM-shaped kwargs. Internally, adapters now move through adapter messages -> LMRequest -> OpenAI/LiteLLM kwargs -> current BaseLM -> LMResponse -> existing adapter postprocess path. (#9802 by @MaximeRivest)
Added exact adapter-format regression coverage before and during the boundary work, including Chat, JSON, XML, BAML, TwoStep, tools, reasoning, citations, multimodal content, custom types, demos, and history. (#9791, #9792 by @MaximeRivest)
Made BaseLM the owner of shared runtime state and changed BaseLM.copy() to an explicit shallow runtime copy. (#9821 by @MaximeRivest)
Added sanitized BaseLM.dump_state() and BaseLM.load_state() support, including trusted custom LM class loading through allow_unsafe_lm_state=True. (#9820 by @MaximeRivest)
Added structured DSPy LM exceptions and wired them into dspy.LM and adapter fallback behavior. (#9826 by @MaximeRivest)
Made LiteLLM imports lazy so importing DSPy does not eagerly import the LiteLLM bridge. (#9752 by @MaximeRivest)
Added the public typed LM API migration plan for custom LM and adapter authors. (#9828 by @MaximeRivest)

ReActV2 and Tool Calling

Added dspy.ReActV2, a native-tool-aware ReAct predictor with dspy.Tool conversion, an internal submit tool, serialized history input support, unknown-tool and tool-exception handling, and forced final submission when needed. (#9825 by @isaacbmiller)
Enabled ReActV2-style agents to use provider-side parallel tool calls when the adapter is configured with parallel_tool_calls, preserving each model-requested call and each observation by call ID. (#9823, #9824, #9825 by @isaacbmiller)
Moved ReActV2 history from a formatted trajectory string into structured dspy.History, so native-tool providers can see prior turns as assistant/tool messages and prompt-caching providers can reuse stable prompt prefixes more effectively. (#9824, #9825 by @isaacbmiller)
Preserved provider tool-call IDs on ToolCalls.ToolCall and added ToolCallResults for call IDs, tool names, values, and error flags. (#9823 by @isaacbmiller)
Taught adapters to replay prior native assistant tool calls and matching tool results as native LM messages when native function calling is enabled, without changing non-native history rendering. (#9824 by @isaacbmiller)
Rendered tool calls in inspect_history for assistant messages and LM output records, including provider-style function payloads and tool-call outputs without text. (#9835 by @isaacbmiller)

Bug Fixes

Render tool calls in inspect_history and avoid crashing on assistant messages with content=None or output records with tool calls but no text. (#9835 by @isaacbmiller)
Repair cached provider Pydantic serializers when fresh processes read cached OpenAI/LiteLLM response objects with stale MockValSer serializers. (#9830 by @isaacbmiller)
Suppress missing-input warnings for omitted signature inputs whose annotations allow None, while preserving warnings for genuinely missing required inputs. (#9834 by @isaacbmiller)
Handle usage=None from the OpenAI Responses API on truncated responses. (#9718 by @isaacbmiller)
Make module load_state transactional by validating before mutating state. (#9741 by @ashishSoni1234)
Await async tool functions in PythonInterpreter. (#9754 by @Archelunch)
Resolve symlinks correctly for Deno --allow-read paths in PythonInterpreter. (#9748 by @npow)
Remove deprecated Avatar prefix arguments from internal signatures. (#9767 by @nullhack)
Fix a mismatched closing code fence in the README. (#9771 by @abhicris)
Fix the FAQ learn-guide link. (#9798 by @cosmopolitan033)

Docs, Testing, CI, and Dependencies

Added and expanded exact adapter-format message coverage. (#9791, #9792 by @MaximeRivest)
Isolated DSPy cache usage in tests. (#9795 by @isaacbmiller)
Added the normalized LM API migration plan. (#9828 by @MaximeRivest)
Hardened workflows with actionlint/zizmor and SARIF permissions updates. (#9742, #9743, #9745 by @isaacbmiller)
Removed git-auto-commit-action from the release workflow. (#9746 by @isaacbmiller)
Updated actions/setup-python from 3.1.4 to 6.2.0. (#9602 by @dependabot[bot])
Updated runtime/development dependencies including anyio, cachetools, tenacity, weaviate-client, mistune, and zizmor-action. (#9730, #9731, #9732, #9760, #9759, #9787 by @dependabot[bot])

Contributors

@isaacbmiller
@MaximeRivest
@BenMcH
@abhicris
@kmad (first contribution, #9411)
@ashishSoni1234 (first contribution, #9741)
@Archelunch (first contribution, #9754)
@npow (first contribution, #9748)
@nullhack (first contribution, #9767)
@cosmopolitan033 (first contribution, #9798)

Full Changelog: 3.2.1...3.3.0b1

stanfordnlp/dspy 3.3.0b1 on GitHub