In DeepEval's latest release, we are introducing multimodal G-Eval, plus 7+ multimodal metrics!
Previously we had great support for single-turn, text evaluation in the form of LLMTestCase
s, but now we're adding MLLMTestCase
, which accepts images:
from deepeval.metrics import MultimodalGEval
from deepeval.test_case import MLLMTestCaseParams, MLLMTestCase, MLLMImage
from deepeval import evaluate
m_test_case = MLLMTestCase(
input=["Show me how to fold an airplane"],
actual_output=[
"1. Take the sheet of paper and fold it lengthwise",
MLLMImage(url="./paper_plane_1", local=True),
"2. Unfold the paper. Fold the top left and right corners towards the center.",
MLLMImage(url="./paper_plane_2", local=True)
]
)
text_image_coherence = MultimodalGEval(
name="Text-Image Coherence",
criteria="Determine whether the images and text is coherence in the actual output.",
evaluation_params=[MLLMTestCaseParams.ACTUAL_OUTPUT],
)
evaluate(test_cases=[m_test_case], metrics=[text_image_coherence])
Docs here: https://deepeval.com/docs/multimodal-metrics-g-eval
PS. This also includes platform support
