mlflow 3.9.0 on Python PyPI

We're excited to announce MLflow 3.9.0, which includes several notable updates:

Major New Features:

🔮 MLflow Assistant: Figuring out the next steps to debug your apps and agents can be challenging. We're excited to introduce the MLflow Assistant, an in-product chatbot that can help you identify, diagnose, and fix issues. The assistant is backed by Claude Code, and directly passes context from the MLflow UI to Claude. Click on the floating "Assistant" button in the bottom right of the MLflow UI to get started!
📈 Trace Overview Dashboard: You can now get insights into your agent's performance at a glance with the new "Overview" tab in GenAI experiments. Many pre-built statistics are available out of the box, including performance metrics (e.g. latency, request count), quality metrics (based on assessments), and tool call summaries. If there are any additional charts you'd like to see, please feel free to raise an issue in the MLflow repository!
✨ AI Gateway: We're revamping our AI Gateway feature! AI Gateway provides a unified interface for your API requests, allowing you to route queries to your LLM provider(s) of choice. In MLflow 3.9.0, the Gateway server is now located directly in the tracking server, so you don't need to spin up a new process. Additional features such as passthrough endpoints, traffic splits, and fallback models are also available, with more to come soon! For more detailed information, please take a look at the docs.
🔎 Online Monitoring with LLM Judges: Configure LLM judges to automatically run on your traces, without having to write a line of code! You can either use one of our pre-defined judges, or provide your own prompt and instructions to create custom metrics. Head to the new "Judges" tab within the GenAI Experiment UI to get started.
🤖 Judge Builder UI: Define and iterate on custom LLM judge prompts directly from the UI! Within the new "Judges" tab, you can create your own prompt for an LLM judge, and test-run it on your traces to see what the output would be. Once you're happy with it, you can either use it for online monitoring (as mentioned above), or use it via the Python SDK for your evals.
🔗 Distributed Tracing: Trace context can now be propagated across different services and processes, allowing you to truly track request lifecycles from end to end. The related APIs are defined in the mlflow.tracing.distributed module (with more documentation to come soon).
📚 MemAlign - a new judge optimizer algorithm: We're excited to introduce MemAlignOptimizer, a new algorithm that makes your judges smarter over time. It learns general guidelines from past feedback while dynamically retrieving relevant examples at runtime, giving you more accurate evaluations.

Features:

[Gateway] Add LiteLLM provider to support many other providers (#19394, @TomeHirata)
[Gateway] Add passthrough support for Anthropic Messages API (#19423, @TomeHirata)
[Gateway] Add passthrough support for Gemini generateContent and streamGenerateContent APIs (#19425, @TomeHirata)
[Gateway] Add routing strategy and fallback configuration support for gateway endpoints (#19483, @TomeHirata)
[Gateway] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
[Gateway / UI] Create List API Keys landing page (#19441, @BenWilson2)
[Gateway / UI] Add Create API Keys functionality (#19442, @BenWilson2)
[Gateway / UI] Add delete and update capabilities for API Keys (#19446, @BenWilson2)
[Gateway / UI] Add endpoint listing page and tab layout (#19474, @BenWilson2)
[Gateway / UI] Add Create endpoint page and enhance provider select (#19475, @BenWilson2)
[Gateway / UI] Add Model select functionality for endpoint creation (#19477, @BenWilson2)
[Gateway / UI] Add Auth config to endpoint creation (#19494, @BenWilson2)
[Gateway / UI] Add the Endpoint Edit Page (#19502, @BenWilson2)
[Gateway / UI] Refactor the provider display for better UX (#19503, @BenWilson2)
[Gateway / UI] Create Endpoint details page (#19537, @BenWilson2)
[Gateway / UI] Add security notice banner (#19538, @BenWilson2)
[Gateway / UI] Create common editable combo box with extra modal select (#19546, @BenWilson2)
[Evaluation] Introduce MemAlign as a new optimizer for judge alignment (#19598, @smoorjani)
[Evaluation] Parallelize LLM calls in MemAlign guideline distillation (#20291, @veronicalyu320)
[Evaluation] Add GePaAlignmentOptimizer for judge instruction optimization (#19882, @alkispoly-db)
[Evaluation] Add Fluency scorer for evaluating text quality (#19414, @alkispoly-db)
[Evaluation] Add KnowledgeRetention built-in scorer (#19436, @alkispoly-db)
[Evaluation] Implement automatic discovery for builtin scorers (#19443, @alkispoly-db)
[Evaluation] Add Phoenix (Arize) third-party scorer integration (#19473, @debu-sinha)
[Evaluation] Add gateway provider support for scorers (#19470, @danielseong1)
[Evaluation] Introduce a conversation simulator into mlflow.genai (#19614, @smoorjani)
[Evaluation] Integrate conversation simulation into mlflow.genai.evaluate (#19760, @smoorjani)
[Evaluation] Make conversation simulator work with datasets (#19845, @SomtochiUmeh)
[Evaluation] Support for conversational datasets with persona, goal, and context (#19686, @SomtochiUmeh)
[Evaluation] Introduce conversational guidelines scorer (#19729, @smoorjani)
[Evaluation] Update tool call correctness judge to accept expected tool calls (#19613, @smoorjani)
[Evaluation] Support trace parsing fallback using Databricks model (#19654, @AveshCSingh)
[Evaluation] Documentation for online evaluation / scoring (#20103, @dbczumar)
[Evaluation] Job backend: Update job backend to use static names rather than function full names (#19430, @WeichenXu123)
[Evaluation] Job backend: support job cancellation (#19565, @WeichenXu123)
[Tracing] Support distributed tracing (#19920, @WeichenXu123)
[Tracing] Trace Metrics backend (#19271, @serena-ruan)
[Tracing] Add IS NULL / IS NOT NULL comparator support for trace metadata filtering (#19720, @dbczumar)
[Tracing] Auto-navigate to Events tab when clicking error spans (#20188, @anshuman-sahu)
[Tracing] Support shift+select for Traces (#20125, @B-Step62)
[Tracing] SpringAI Integration (#19949, @joelrobin18)
[Tracing] Reasoning in Chat UI for OpenAI, Anthropic, Gemini, Langchain, and PydanticAI (#19535, #19541, #19627, #19651, #19657, @joelrobin18)
[UI] Merge MLflow Assistant branch (#20011, @B-Step62)
[UI] Current Page context to assistant (#20139, @joelrobin18)
[UI] Assistant regenerate button (#20066, @joelrobin18)
[UI] Copy button Assistant (#20063, @joelrobin18)
[UI] Overview tab for GenAI experiments (#19521, @serena-ruan)
[UI] Enable Scorers UI feature flags (#19842, @danielseong1)
[UI] Improve LLM judge creation modal UX and variable ordering (#19963, @danielseong1)
[UI] Hide instructions section for built-in LLM judges (#19883, @danielseong1)
[UI] Change model provider and name to dropdown list (#19653, @chenmoneygithub)
[Prompts] Support Jinja2 template in prompt registry (#19772, @B-Step62)
[Prompts] Support metaprompting in mlflow.genai.optimize_prompts() (#19762, @chenmoneygithub)
[Prompts] Add option to delegate saving dspy model to dspy.module.save API (#19704, @WeichenXu123)
[Prompts / UI] Add traces mode to prompts details page and implement filtered traces (#19599, @TomeHirata)
[Tracking] Support mlflow.genai.to_predict_fn for app invocation endpoints (#19779, @jennsun)
[Tracking] Add log_stream API for logging binary streams as artifacts (#19104, @harupy)
[Tracking] Add import_checkpoints API for databricks SGC Checkpointing with MLflow (#19839, @WeichenXu123)
[Tracking] Support GC clean up for Historical Jobs (#19626, @joelrobin18)
[Tracking] Add JupyterNotebookRunContext for Tracking local Jupyter notebook as the source (#19162, @iyashk)
[Tracking] Full docker image support with db (#19979, @serena-ruan)
[Tracking] Add react route handling to communicate with the tracking server (#19010, @BenWilson2)
[Tracking] [TypeScript SDK] Simplify Databricks auth by delegating to Databricks SDK (#19434, @simonfaltum)
[Models] Safe model serialization: Support saving pytorch model via torch.export.save, add skops serialization format, and deprecate unsafe pickle/cloudpickle formats (#18759, #18832, #19692, #20151, @WeichenXu123)

Bug fixes:

[Gateway] Fix Anthropic and Gemini streaming for LiteLLM providers (#20398, @TomeHirata)
[Build] Include git submodule contents in Python package build (#20394, @copilot-swe-agent)
[Tracing] Fix duplicate traces in semantic kernel autolog (#20206, @harupy)
[Tracing] Fix Claude autolog to prioritize settings.json over OS environment variables (#20376, @alkispoly-db)
[Evaluation] Fix temperature/json issues with ConversationSimulator on managed (#20236, @xsh310)
[Tracing / UI] Add support for OpenAI function calling inputs in chat UI parsing (#20058, @daniellok-db)
[Tracking] Update checking code for pickle deserialization (#20267, @WeichenXu123)
[Gateway] Fix Vertex AI model configuration (#20242, @TomeHirata)
[UI] Store gateway<>scorer binding correctly (#20176, @TomeHirata)
[Evaluation] Support SparkDF trace handling in eval (#20207, @BenWilson2)
[Evaluation] Fix tool name extraction for tool call correctness (#20201, @smoorjani)
[Prompts] Fix scorers issue in metaprompting (#20173, @chenmoneygithub)
[UI] Propagate Run id context to Assistant (#20138, @joelrobin18)
[Model Registry] Allow for model registration to use KMS auth from different workspace (#20156, @BenWilson2)
[UI] Improve scorer trace picker UX and validation (#20178, @danielseong1)
[Evaluation] Improve MemAlign optimizer for incremental judge alignment (#20049, @veronicalyu320)
[Evaluation] Fix bug with max tokens using max output tokens (#20174, @smoorjani)
[Evaluation] Fix a race condition bug when using DF inputs for genai eval (#20079, @BenWilson2)
[Tracking] Fix DATABRICKS_CONFIG_PROFILE env var detection when fetching databricks credentials (#20112, @daniellok-db)
[Gateway] Move gateway invocation validation to fastapi middleware (#20111, @TomeHirata)
[Prompts] Fix the length check in mlflow.genai.optimize_prompts() (#19993, @chenmoneygithub)
[UI] Fix trace selection not registering in SelectTracesModal (#20099, @joelrobin18)
[UI] Fix LimitOverrunError in Assistant streaming (#20078, @joelrobin18)
[Tracing] CC Token usage (#20022, @joelrobin18)
[Gateway] Remove MLflow-specific auth_mode from LiteLLMConfig (#20059, @TomeHirata)
[UI] Assistant UI fix for dark theme (#20056, @joelrobin18)
[Tracing] Isolate runtime context between opentelemetry and mlflow (#19797, @B-Step62)
[UI] Prevent spurious 404 requests for relative image URLs in markdown (#20003, @harupy)
[Tracing] Support MLflow tracing with OpenTelemetry auto-instrumentation (#19501, @serena-ruan)
[UI] [UI] Fix session selector table column resizing and link behavior (#19927, @danielseong1)
[Gateway] Add Azure provider support in gateway configuration (#19933, @TomeHirata)
[Gateway] Propagate extra auth config to LiteLLM provider (#19931, @TomeHirata)
[Evaluation / UI] Add missing retrieval context error for retrieval scorers (#19895, @danielseong1)
[Evaluation / UI] Improve trace selection UX in scorer/judge UI (#19913, @danielseong1)
[Model Registry / Models] Fix infer_code_paths to capture transitive imports of functions/classes (#19814, @copilot-swe-agent)
[Tracking] fix for addressing rest api call latency in databricks job run (#19886, @WeichenXu123)
[UI] Enable {{trace}} variable support in sample judge evaluation (#19851, @danielseong1)
[Scoring] Check security before extracting tar file (#19557, @WeichenXu123)
[Gateway] Fix authorization header duplication (#19853, @TomeHirata)
[Gateway] Fix Gateway error handling to translate MlflowException to HTTPException (#19728, @danielseong1)
[Gateway] Remove gateway_deprecated decorator - AI Gateway is not deprecated (#19821, @copilot-swe-agent)
[Tracking] Make local artifact location creation lazy to support read-only proxy environments (#19678, @BenWilson2)
[Evaluation] fixed databricks hosted llm failure due to response_schema injection (#19741, @sinanshamsudheen)
[Evaluation] Add @overload annotations to @scorer decorator for proper type inference (#19570, @mr-brobot)
[Tracking] Add debug logging for 500 errors in catch_mlflow_exception (#19781, @harupy)
[Tracing] [Bug fix] Support search traces by string feedback / expectation values (#19719, @dbczumar)
[Tracing / UI] Fix scorer creation UX issues (#19756, @danielseong1)
[Evaluation] Fix KnowledgeRetention model parameter not propagating to inner scorer (#19753, @danielseong1)
[Tracking] [BUG] serve-artifacts is not enabled in docker-compose #19700 (#19701, @zjffdu)
[Tracing] Fix type signature loss in @trace_disabled decorator (#19569, @mr-brobot)
[Tracking] Fix: Return 400 instead of 500 for invalid experiment_id (#19655, @copilot-swe-agent)
[Models] Fix schema enforcement for pandas StringDtype (#19518, @harupy)
[Tracing] Fix Python 3.12 DeprecationWarning from generator.throw() in tracing (#19629, @mr-brobot)
[Evaluation] Fix structured outputs for databricks serving endpoints (#19572, @smoorjani)
[Models / Scoring] Add dict to PyFuncOutput type alias for ResponsesAgent/ChatAgent/ChatModel (#19560, @copilot-swe-agent)
[Tracking] Fix enable_git_model_versioning to work from subdirectories (#19529, @copilot-swe-agent)

Documentation updates:

[Docs] fix: Remove multi_class argument from scikit-learn's LogisticRegression in docs (#20266, @SOORAJTS2001)
[Docs] Add doc for distributed tracing (#20027, @WeichenXu123)
[Docs] Add Judge Builder UI documentation (#20163, @danielseong1)
[Docs] Add framework integration examples for AI Gateway query-endpoint page (#20137, @TomeHirata)
[Docs] Add "Evaluation Examples" article (#19722, @achen530)
[Docs] [1/3] Add gateway tracing guide for LiteLLM, OpenRouter, and Vercel AI Gateway (#20031, @B-Step62)
[Docs] Update prompt optimization doc to include metaprompting (#19966, @chenmoneygithub)
[Docs] Reorganize gateway page structure (#19968, @TomeHirata)
[Build / Docs] Fix broken auth REST API documentation links (#19872, @copilot-swe-agent)
[Docs] Add setup and query documentation for new AI Gateway (#19804, @TomeHirata)
[Docs] Add additional eval dataset serialization examples (#19697, @BenWilson2)
[Docs] ML-60766: Add dataset schema from managed content to SDK reference page (#19676, @achen530)
[Docs / Prompts] Fix duplicate tags argument in register_prompt documentation example (#19591, @copilot-swe-agent)
[Docs] Fix ML-59546 eval quickstart links to wrong place, add notebook version of eval quickstart (#19511, @achen530)
[Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)

Small bug fixes and documentation updates:

#20406, #20122, #20317, #20333, #20361, #20274, #20362, #20249, #20169, #20345, #20252, #20314, #20214, #20215, #20210, #20212, #20142, #20183, #20121, #20141, #20140, #20124, #20073, #20062, #20065, #19893, #19912, #19464, #19857, #19401, #19600, #19555, #19400, #19392, #19393, @B-Step62; #20323, #20263, #19982, #20218, #20143, #20146, #20145, #20064, #20117, #20144, #20110, #20050, #20017, #20116, #20118, #19989, #19953, #19836, #19915, #19955, #19952, #19940, #19939, #19938, #19937, #19877, #19874, #19869, #19867, #19865, #19837, #19835, #19834, #19864, #19873, #19833, #19825, #19876, #19799, #19798, #19793, #19771, #19770, #19635, #19634, #19633, #19632, #19624, #19622, #19621, #19620, #19631, #19619, #19747, #19609, #19608, #19607, #19606, #19604, #19603, #19602, #19601, #19588, #19587, #19581, #19585, #19610, #19590, #19580, #19579, #19578, #19577, #19576, #19234, @serena-ruan; #20378, #20385, #20205, #20237, #20193, #20171, #20155, #20170, #20132, #20097, #20100, #20101, #19736, #19717, #19716, #19759, #19718, #19714, #19713, #19712, #19711, #19840, #19710, #19709, #19708, #19777, #19707, @dbczumar; #20387, #19981, #19964, @bbqiu; #20390, #20334, #20208, #19978, #19980, #19875, #19854, #19816, #19815, #19796, #19806, #19785, #19789, #19769, #19748, #19773, #19782, #19706, #19523, #19505, #19450, #19482, #19458, #19433, #19431, #19455, #19417, #19426, #19424, @harupy; #20355, #20245, #20120, #20229, #20114, #20053, #20012, #19972, #20002, #19991, #19990, #19977, #19986, #19985, #19967, #19957, #19960, #19954, #19945, #19941, #19934, #19917, #19916, #19905, #19904, #19903, #19900, #19899, #19897, #19894, #19892, #19890, #19888, #19887, #19861, #19828, #19818, #19803, #19802, #19791, #19788, #19795, #19790, #19786, #19783, #19767, #19768, #19746, #19735, #19733, #19732, #19726, #19561, #19549, #19544, #19543, #19510, #19486, #19487, #19463, #19871, @copilot-swe-agent; #20308, #20264, #20109, #20181, #20180, #20177, #20134, #20107, #20015, #20007, #20008, #19930, #20006, #20005, #19965, #19942, #19944, #19950, #19936, #19947, #19948, #19946, #19870, #19824, #19823, #19856, #19863, #19858, #19860, #19849, #19822, #19765, #19792, #19764, #19763, #19618, #19453, #19452, #19404, #19390, #19290, @TomeHirata; #20350, #20203, #19675, #19677, #19674, #19476, #19447, @BenWilson2; #20286, #20157, #20051, #20216, #20200, #20213, #20194, #20072, #20195, #20175, #20039, #19844, #19935, #19696, #19451, #19409, @smoorjani; #20209, #20131, #19742, #19969, #19734, #19480, #19351, @daniellok-db; #20204, #20164, #20192, #19997, #19925, #19850, #19914, #19774, #19721, #19673, #19623, #19668, #19496, #19554, #19471, @danielseong1; #20037, #19884, #19846, #19843, #19813, #19454, #19391, #19322, #19388, #19307, #19382, @xsh310; #20130, @iyashk; #20147, #20030, #19962, #19826, @kevin-lyn; #20108, #20071, #19743, #20045, #20042, #19959, #19880, @SomtochiUmeh; #20025, #19662, #19749, #19738, #19419, @WeichenXu123; #19847, @jaceklaskowski; #19820, @Abhiii47; #19800, @shreenidhi2205; #19703, #19693, #19689, #19688, #19664, #19663, #19660, #19534, #19533, #19532, #19531, @hubertzub-db; #19652, @AMRUTH-ASHOK; #19493, #19495, @alkispoly-db; #16372, @mohammadsubhani; #19522, @pmeier

mlflow 3.9.0 v.3.9.0 on Python PyPI

mlflow 3.9.0
v.3.9.0

on Python PyPI