We're excited to announce MLflow 3.10.0, which includes several notable updates:
Major New Features:
🏢 Organization Support in MLflow Tracking Server: MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of unit and logically isolate them in a single tracking server. (#20702, #20657, @mprahl, @Gkrumbach07, @B-Step62)
💬 Multi-turn Evaluation & Conversation Simulation: MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested. (#20243, #20377, #20289, @smoorjani)
💰 Trace Cost Tracking: Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. (#20327, #20330, @serena-ruan)
🎯 Navigation bar redesign: We've redesigned the navigation to provide a frictionless experience. A new workflow type selector in the top-level navbar lets you quickly switch between GenAI and Classical ML contexts, with streamlined sidebars that reduce visual clutter. (#20158, #20160, #20161, #20699, @ispoljari, @daniellok-db)
🎮 MLflow Demo Experiment: New to MLflow GenAI? With one click, launch a pre-populated demo and explore tracing, evaluation, and prompt management in action. No configuration, no code required. (#19994, #19995, #20046, #20047, #20048, #20162, @BenWilson2)
📊 Gateway Usage Tracking: Monitor your AI Gateway endpoints with detailed usage analytics. A new usage page shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end observability. (#20357, #20358, #20642, @TomeHirata)
⚡ In-UI Trace Evaluation: Users can now run custom or pre-built LLM judges directly from the traces and sessions UI. This enables quick evaluation of individual traces and individual without context switching to the python SDK. (#20360, @hubertzub-db, @danielseong1)
Features:
- [UI] Add sliding animation to workflow switch component (#20831, @daniellok-db)
- [Tracing] Display cached tokens in trace UI (#20957, @TomeHirata)
- [Evaluation] Move select traces button to be next to Run judge (#20992, @PattaraS)
- [Gateway] Distributed tracing for gateway endpoints (#20864, @TomeHirata)
- [Gateway] Add user selector in the gateway usage page (#20944, @TomeHirata)
- [Docs] [MLflow Demo] Docs for GenAI Demo (#20240, @BenWilson2)
- [UI] Move Getting Started above experiments list and make collapsible (#20691, @B-Step62)
- [Model Registry / Tracking] Add mlflow
migrate-filestorecommand (#20615, @harupy) - [UI] Add visual indicator for demo experiment in experiment list (#20787, @B-Step62)
- [Scoring] Enable parquet content_type in the scoring server input for pyfunc (#20630, @TFK1410)
- [UI] feat(ui): Add workspace landing page, multi-workspace support, and qu… (#20702, @Gkrumbach07)
- [Tracking] Merge workspace feature branch into master (#20657, @B-Step62)
- [Gateway] Add Gateway Usage Page (#20642, @TomeHirata)
- [Gateway] Add usage section in endpoint page (#20357, @TomeHirata)
- [UI] [ MLflow Demo ] UI updates for MLflow Demo interfaces (#20162, @BenWilson2)
- [Build] Support comma-separated rules in
# clint: disable=comments (#20651, @copilot-swe-agent) - [Build / Docs / Models / Projects / Scoring] Replace
virtualenvwithpython -m venvin virtualenv env_manager path (#20640, @copilot-swe-agent) - [Tracing] Add per-decorator
sampling_ratio_overrideparameter to@mlflow.trace(#19784, @harupy) - [Evaluation / Tracking] Add
mlflow datasets listCLI command (#20167, @alkispoly-db) - [Gateway] Add trace ingestion for Gateway endpoints (#20358, @TomeHirata)
- [Tracing] feat(typescript-anthropic): add streaming support (#20384, @rollyjoel)
- [Evaluation] Add delete dataset records API (#19690, @joelrobin18)
- [UI] Add tooltip link to navigate to traces tab with time range filter (#20466, @serena-ruan)
- [Tracking] [MLflow Demo] Add mlflow demo cli command (#20048, @BenWilson2)
- [Evaluation] Add an SDK for distillation from conversation to goal/persona (#20289, @smoorjani)
- [Tracing] Livekit Agents Integration in MLflow (#20439, @joelrobin18)
- [Tracing / UI] Enable running scorers/judges from trace details drawer in UI (#20518, @danielseong1)
- [Gateway] link gateway and experiment (#20356, @TomeHirata)
- [Prompts] Add optimization backend APIs to auth control (#20392, @chenmoneygithub)
- [Tracing] Add an SDK for search sessions to get complete sessions (#20288, @smoorjani)
- [Tracing] Reasoning in Chat UI Mistral + Chat UI (#19636, @joelrobin18)
- [Evaluation] Add TruLens third-party scorer integration (#19492, @debu-sinha)
- [Evaluation / Tracing] Add Guardrails AI scorer integration (#20038, @debu-sinha)
- [Tracking] [MLflow Demo] Add Prompt demo data (#20047, @BenWilson2)
- [Tracking] [MLflow Demo] Add Eval simulation data (#20046, @BenWilson2)
- [Tracking] [MLflow Demo] Add trace data for demo (#19995, @BenWilson2)
- [Tracking] Support get_dataset(name=...) in OSS environments (#20423, @alkispoly-db)
- [UI] Add session comparison UI with goal/persona matching (#20377, @smoorjani)
- [UI] Model and cost rendering for spans (#20330, @serena-ruan)
- [UI] [1/x] Support span model extraction and cost calculation (#20327, @serena-ruan)
- [Evaluation] Make conversation simulator public and easily subclassable (#20243, @smoorjani)
- [Prompts] Add progress tracking for prompt optimization job (#20374, @chenmoneygithub)
- [Prompts] Prompt Optimization backend PR 3: Add Get, Search, and Delete prompt optimization job APIs (#20197, @chenmoneygithub)
- [Prompts] Track intermediate candidates and evaluation scores in gepa optimizer (#20198, @chenmoneygithub)
- [Tracking] [MLflow Demo] Base implementation for demo framework (#19994, @BenWilson2)
- [Prompts] Prompt Optimization backend PR 2: Add CreatePromptOptimizationJob and CancelPromptOptimizationJob (#20115, @chenmoneygithub)
- [Tracing] Support shift+select for Traces (#20125, @B-Step62)
- [UI] Ml61127/remove experiment type selector inside experiment page (#20161, @ispoljari)
- [UI] Ml61126/remove nested sidebars within gateway and experiments tab (#20160, @ispoljari)
- [UI] [ML-61124]: add selector for workflow type in top level navbar (#20158, @ispoljari)
- [Prompts / UI] Feat/render md in prompt registry (#19615, @iyashk)
- [Prompts] [Prompt Optimization Backend PR #1] Wrap prompt optimize in mlflow job (#20001, @chenmoneygithub)
- [Tracking] Add --experiment-name option to mlflow experiments get command (#19929, @alkispoly-db)
Bug fixes:
- [Tracing / UI] Fix infinite fetch loop in trace detail view when num_spans metadata mismatches (#20596, @coldzero94)
- [UI] fix:implement dark mode in experiment correctly (#20974, @intelliking)
- [Evaluation] Fix 'Select traces' do not show new traces in Judge UI (#20991, @PattaraS)
- [Tracing / Tracking] Fix RecursionError in strands, semantic_kernel, and haystack autologgers with shared tracer provider (#20809, @cgrierson-smartsheet)
- [Tracking] fix(tracking): Fix IntegrityError in log_batch when duplicate metrics span multiple key batches (#20807, @aws-khatria)
- [Tracing] Support native tool calls in CrewAI 1.9.0+ autolog tests (#20742, @TomeHirata)
- [Evaluation] Fix retrieval_relevance assessments logged to wrong span with missing chunk index (#20998, @smoorjani)
- [Evaluation] Fix missing session metadata on failed session-level scorer assessments (#20988, @smoorjani)
- [Tracking] Enhance path validation in check_tarfile_security for windows (#20924, @TomeHirata)
- [Docs] Fix admonition link underlines not rendering (#20990, @copilot-swe-agent)
- [Tracking] Rebuild
SearchTracesV2 request body onENDPOINT_NOT_FOUNDfallback (#20963, @brendanmaguire) - [Build] Add model version search filtering based on user permissions (#20964, @TomeHirata)
- [Tracing] Display notebook trace viewer when workspace is on (#20947, @TomeHirata)
- [Tracing] Add
MLFLOW_GATEWAY_RESOLVE_API_KEY_FROM_FILEflag to prevent local file inclusion in API gateway (#20965, @TomeHirata) - [Tracking] Fix Claude Agent SDK tracing by capturing messages from receive_messages (#20778, @smoorjani)
- [Build / Tracking] Add missing authentication for fastapi routes (#20920, @TomeHirata)
- [Evaluation] Fix guardrails scorer compatibility with guardrails-ai 0.9.0 (#20934, @smoorjani)
- [UI] Fix duplicated title and add icons to Experiments/Prompts page headers (#20813, @B-Step62)
- [Tracing] Trace UI papercut: highlight searched text and change search box hint's wording. (#20841, @PattaraS)
- [Prompts] Fix arbitrary file read via prompt tag validation bypass in Model Registry (#20833, @TomeHirata)
- [Tracking] Fix
RestExceptioncrash on nullerror_codeand incorrect except clause (#20903, @copilot-swe-agent) - [UI] Fix Disable action button in Traces Tab (#20883, @joelrobin18)
- [UI] Fix experiment rename modal not refreshing experiment details (#20882, @joelrobin18)
- [Build] Skip workspace header when workspace is disabled (#20904, @TomeHirata)
- [UI] Block CORS for ajax paths (#20832, @TomeHirata)
- [UI] [UI] Improve empty states across Experiments, Models, Prompts, and Gateway pages (#20044, @ridgupta26)
- [UI] UI: Improve empty states for Traces and Sessions tabs (#20034, @ridgupta26)
- [Build] Validate webhook url to fix SSRF vulnerability (#20747, @TomeHirata)
- [Scoring / Tracing] Fix TypeError in online scoring config endpoint when basic-auth is enabled (#20783, @copilot-swe-agent)
- [Tracing] Fix
experiment_idtype error in gateway config resolver (#20764, @copilot-swe-agent) - [UI] Fix docs link to respect workflow type (GenAI vs ML) (#20752, @copilot-swe-agent)
- [Tracking] Fix: Do not emit pickle warning when user calls
mlflow.pyfunc.log_modelwithloader_moduleparam (#20727, @WeichenXu123) - [Tracing] Change cache config to prevent search bounce (#20688, @PattaraS)
- [Evaluation] Fix multiple align() calls on MemoryAugmentedJudge (#20708, @smoorjani)
- [Evaluation] Batch embedding calls for Databricks endpoints to avoid size limit errors (#20685, @smoorjani)
- [Evaluation] Fix the UI for MemAlign-ed scorers (#20632, @smoorjani)
- [Tracing] Fix type hints lost with @mlflow.trace decorator (#20648, @veeceey)
- [Evaluation] Use JSONAdapter for best-effort structured outputs in MemAlign predictions (#20679, @smoorjani)
- [Tracking] Fix
mlflow demoURL to use experiment ID instead of name (#20678, @copilot-swe-agent) - [Tracking] Fix circular import in FileStore caused by PromptVersion import (#20677, @copilot-swe-agent)
- [Scoring / Tracing] Fix error handling for streaming request (#20610, @TomeHirata)
- [Models] Fix warning message: add space and documentation link for pickle security (#20656, @copilot-swe-agent)
- [Evaluation] Fix SHAP compatibility for shap >= 0.47 (#20623, @copilot-swe-agent)
- [Prompts] Fix the deadlock between run linking and trace linking (#20620, @TomeHirata)
- [Tracking] Fix FTP artifact path handling on Windows with Python 3.11+ (#20622, @copilot-swe-agent)
- [Evaluation] Fix failed judge call error propagation (#20601, @AveshCSingh)
- [Tracking] Fix off-by-one error in
_validate_max_retriesand_validate_backoff_factor(#20597, @vb-dbrks) - [Prompts] Fix bug: linking prompt to experiments does not work for default experiments (#20588, @PattaraS)
- [Build] Fix Docker full image tags not being published for versioned releases (#20589, @copilot-swe-agent)
- [Prompts] Implement locking mechanism to prevent race conditions during prompt linking (#20586, @TomeHirata)
- [Prompts] Revert "Fix bug: linking prompt to experiments does not work for defa… (#20585, @PattaraS)
- [Prompts] Fix bug: linking prompt to experiments does not work for default experiments (#20562, @PattaraS)
- [Model Registry] Fix N+1 query issue in search_registered_models (#20493, @Karim-siala)
- [Tracking] Fix optimistic pagination in SQLAlchemy store
_search_runsand handlemax_results=None(#20547, @copilot-swe-agent) - [UI] Add cancel button for LLM judge evaluations in trace details drawer (#20519, @danielseong1)
- [UI] Fix incorrect 'Trace level' label in session judges modal (#20520, @danielseong1)
- [Tracing] fix: allow overriding notebook trace iframe base URL (#20485, @TatsuyaHayashino)
- [Prompts] Include the prompt model config in the optimized prompt (#20431, @chenmoneygithub)
- [Tracing / UI] Fix Anthropic trace UI rendering for tool_result with image content (#20190, @joncarter1)
- [Tracking] Enforce authorization on AJAX proxy artifact APIs (#20035, @mprahl)
- [Tracking] Ensure server-provided artifact root is reused on MLflowClient calls (#19336, @mprahl)
- [UI] Fix trace selection not registering in SelectTracesModal (#20099, @joelrobin18)
Documentation updates:
- [Docs] Add documentation for
mlflow migrate-filestorecommand (#20616, @harupy) - [Docs] Document X-MLFLOW-WORKSPACE header for AI Gateway endpoints with workspace fallback behavior (#20984, @copilot-swe-agent)
- [Docs] Fix outdated server-features references to server-info (#20948, @copilot-swe-agent)
- [Docs / Tracing] Remove span attributes filtering from search traces documentation (#20858, @copilot-swe-agent)
- [Docs] Add Modal as a supported deployment target with full documentation (#20032, @debu-sinha)
- [Docs] Add gateway usage tracking doc page (#20748, @TomeHirata)
- [Docs / Evaluation] Fix MemAlign bug bash issues (#20712, @veronicalyu320)
- [Docs] Fix docs: trace spans are stored in database, not artifact storage (#20668, @B-Step62)
- [Prompts] Change header level for "Automatic Prompt Linking" section in
use-prompts-in-apps.mdx(#20661, @PattaraS) - [Docs] Multi-turn evaluation launch documentation (#20443, @smoorjani)
- [Prompts] Update
use-prompts-in-apps.mdxwith a section for prompt linking under traced method (#20593, @PattaraS) - [Docs] docs: Add missing targets arg in huggingface dataset docs (#20637, @KarelZe)
- [Build] Display rule names instead of IDs in clint error output (#20592, @copilot-swe-agent)
- [Docs] Detailed guide for setting up SSO with mlflow-oidc-auth plugin (#20556, @WeichenXu123)
- [Prompts] Mark prompt registry APIs as stable. (#20507, @PattaraS)
- [Docs] code-based scorer examples (#20407, @SomtochiUmeh)
- [Docs] Custom judges section (#20393, @SomtochiUmeh)
- [Docs] (mostly) copy over eval datasets article from managed docs (#19787, @achen530)
- [Docs] Add the RAG built-in judges section (#20369, @SomtochiUmeh)
- [Docs] Fix
ToolAgentname formatting in ag2 documentation and examples (#20470, @Umakanth555) - [Docs] Add collection_name parameter to CrewAI knowledge configuration in docs and example (#20469, @Umakanth555)
- [Docs] Update index and predefined judges pages (#20368, @SomtochiUmeh)
- [Docs] docs: Clarify -full Docker image availability from v3.9.0 onwards (#20223, @copilot-swe-agent)
- [Docs] Generalize Knowledge Cutoff Note in CLAUDE.md beyond model names (#20165, @copilot-swe-agent)
Small bug fixes and documentation updates:
#20959, #20915, #20986, #20956, #20912, #20955, #20943, #20919, #20776, #20826, #20781, #20767, #20761, #20760, #20763, #20762, #20687, #20746, #20682, #20667, #20658, #20578, #20559, #20495, #20497, @TomeHirata; #21006, #20980, #20707, #20777, @bbqiu; #20950, #21008, #20877, #20822, #20817, #20813, #20816, #20796, #20815, #20765, #20716, #20689, #20744, #20690, #20451, #20502, #20252, #20314, #20210, @B-Step62; #21000, #20975, #20806, #20449, #20686, #20603, #20573, #20572, #20584, #20551, #20526, #20550, #20523, #20525, #20453, #20478, #20452, #20438, #20474, #20460, #20457, #20459, #20456, #20444, #20418, #20285, #20284, #20283, #20282, #20281, #20280, #20051, @smoorjani; #21005, #21007, #20880, #20857, #20802, #20779, #20717, #20713, #20714, #20692, #20693, #20683, #20675, #20665, #20674, #20673, #20663, #20662, #20659, #20652, #20649, #20650, #20647, #20646, #20641, #20638, #20635, #20634, #20633, #20626, #20625, #20621, #20619, #20618, #20617, #20606, #20564, #20581, #20570, #20568, #20566, #20558, #20560, #20543, #20554, #20537, #20536, #20532, #20530, #20528, #20512, #20505, #20501, #20498, #20496, #20491, #20490, #20489, #20487, #20486, #20484, #20483, #20482, #20441, #20436, #20427, #20417, #20400, #20399, #20397, #20395, #20396, #20391, #20342, #20341, #20332, #20326, #20316, #20315, #20305, #20300, #20299, #20297, #20293, #20268, #20262, #20260, #20251, #20250, #20244, #20235, #20228, #20227, #20226, #20220, #20202, #20186, #20172, #20152, #20150, #19984, #20102, #20098, #20095, #20093, #20094, #20091, #20090, #20089, #20088, #20087, #20086, #20085, #20084, #20083, #20082, #20081, #20080, #20077, #20076, #20075, #20070, #20067, #20069, #20020, #20026, @copilot-swe-agent; #20793, #20791, #20768, @WeichenXu123; #20979, #20701, #20609, #20608, #20569, #20535, #20481, #20318, #20224, #20149, #20119, #20068, #20014, #20016, #20019, @harupy; #20973, @Gkrumbach07; #21003, #20936, #20730, #20041, #20381, @xsh310; #20989, #20830, #20766, #20759, #20758, #20757, #20756, #20699, #20697, #20696, #20695, #20694, #20255, #20254, #20253, #20248, #20247, #20010, #20009, #19999, #19998, #19976, #19975, #19974, #19973, #19971, @daniellok-db; #20976, @aravind-segu; #20725, #20339, #20565, #20660, #20455, #20440, #20404, #20403, #20402, #20567, #20542, #20541, #20540, #20557, #20503, #20506, #20500, #20499, #20467, #20338, #20337, #20331, #20462, #20329, #20328, #20323, @serena-ruan; #20737, @jamesbxwu; #20862, #20861, @PattaraS; #20805, #20705, #20373, @mprahl; #20773, @etirelli; #20753, @etscript; #20629, #19758, @justinwei-db; #20711, @kevin-lyn; #20576, @nisha2003; #20553, #20521, @danielseong1; #20548, @bartosz-grabowski; #20504, @smivv; #20527, @BenWilson2; #20363, #20364, @rollyjoel; #20494, @dbczumar; #20360, #20340, #20313, #20312, #20276, #20275, #20261, #20233, #19484, @hubertzub-db; #20359, @LiberiFatali; #20386, @chenmoneygithub; #20159, @ispoljari