In DeepEval's latest release, we are introducing multimodal G-Eval, plus 7+ multimodal metrics!

Previously we had great support for single-turn, text evaluation in the form of LLMTestCases, but now we're adding MLLMTestCase, which accepts images:

from deepeval.metrics import MultimodalGEval
from deepeval.test_case import MLLMTestCaseParams, MLLMTestCase, MLLMImage
from deepeval import evaluate

m_test_case = MLLMTestCase(
    input=["Show me how to fold an airplane"],
    actual_output=[
        "1. Take the sheet of paper and fold it lengthwise",
        MLLMImage(url="./paper_plane_1", local=True),
        "2. Unfold the paper. Fold the top left and right corners towards the center.",
        MLLMImage(url="./paper_plane_2", local=True)
    ]
)
text_image_coherence = MultimodalGEval(
    name="Text-Image Coherence",
    criteria="Determine whether the images and text is coherence in the actual output.",
    evaluation_params=[MLLMTestCaseParams.ACTUAL_OUTPUT],
)

evaluate(test_cases=[m_test_case], metrics=[text_image_coherence])

Docs here: https://deepeval.com/docs/multimodal-metrics-g-eval

PS. This also includes platform support

confident-ai/deepeval v3.1.5 🎉 New Multimodal Metrics, with Platform Support on GitHub

In DeepEval's latest release, we are introducing multimodal G-Eval, plus 7+ multimodal metrics!

confident-ai/deepeval v3.1.5
🎉 New Multimodal Metrics, with Platform Support

on GitHub