google-ai-edge/mediapipe v0.10.18 on GitHub

Build changes

Following open-sourcing webgpu with open-sourcing one of its dependencies third_party/emscripten
Add pillow, pyyaml, and requests to model_maker BUILD

Framework and core calculator improvements

Loading resources through calculator and subgraph contexts and configuring through kResourcesService.
Use std::make_unique
Moves OnDiskCacheHelper class into a separate file / compilation target
Pools: report buffer specs on failure, fix status propagation, fix includes
Open-Source MediaPipe's WebGPU helpers.
BatchMatul uses transpose parameter.
Introduce Resource to represent a generic resource (file content, embedded/in-memory resource) for reading.
Bump up the version number to 0.10.16
Migrate from AdapterProperties to AdapterInfo
Migrate from Resource::ReadContents to Resources::Get (using ForEachLine where required)
Update Resources docs to mention ForEachLine (so devs don't fallback to ReadContents in such a case)
Adjust WebGPU device registration
Fix includes/copies/checks for BuildLabelMapFromFiles
Migrate to BuildLabelMapFromFiles.
Update Python version requirements in setup.py
Introduce Resources with mapping, so graphs can use placeholders instead of actual resource paths.
Remove Resources::ReadContents & add Resource::TryReleaseAsString.
Fix ports for multi side outputs.
Update solution android apps with explicit exported attribute.
Ensure kResourcesService is set before CalculatorGraph is initialized (otherwise subgraphs/nodes may get the wrong default resources).
Switch inference tests to ResourceProviderCalculator & update builder to refer MODEL_RESOURCE.
Migrate modules to use ResourceProviderCalculator.
Support single tensor input in TensorsToImageCalculator
Migrate TfLiteModelLoader to use MP Resources.
Remove deprecated TfLiteModelLoader::LoadFromPath.
Fix for isIOS() platform util on worker and non-worker contexts
Support single tensor input in TensorsToSegmentationCalculator
Makes CalculatorContext::GetGraphServiceManager() private
BatchMatMul can handle cases where ndims != 4 and quantization
RmsNorm has an optional scale parameter.
Allowed variable audio packet size by setting num_samples to null.
Fix technically correct but confusing example in top level comments.
Removing ReturnType helper, since it's part of the standard now.
Update XNNPack to 9/24
Enable LoRA conversion support for Gemma2-2B
Improve warning when InferenceCalculator backends are not linked
Bump MediaPipe version to 0.10.17.
Update OpenCV to a version that compiles with C++ 17
Force xnnpack when CPU inference is enforced
Install PyBind before TensorFlow to get the MediaPipe version
Change MP version to 0.10.18
Add validation to LLM bundler, alternative takePicture method to support custom thread executor, CopySign op, const Spec() method to OutputStreamManager, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, menu for the default demo app and option to Close processor/graph and Exit gracefully, ngrammer, per layer embeddings and Relu1p5 fields to llm_params and update from Proto, a special InMemory Resources (current use case is in tests, but may be needed for some simple things as well), ResourceProviderCalculator (replacement for LocalFileContentsCalculator), Resource support into TfliteModelCalculator and a flag to set the default number of XNNPACK threads.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Initialize new members in LlmModelSettings
Create an implicit session for all requests to generateResponse()
Change session management so that all JNI calls come from the same thread.
Add Session API support to LLM Java API

iOS

Updated name of iOS audio classifier delegate
Fixed incorrect stream mode in iOS audio classifier options
Added method to ios audio task runner
Updated iOS audio classifier BUILD file
Fixed buffer length calculation in iOS MPPAudioData
Updated iOS audio data tests to fix issue in buffer length calculation
Revert "Added method for getting interleaved float32 pcm buffer from audio file"
Updated comments in iOS LlmInference
Dropped Refactored suffix for modified files in iOS genai
Updated documentation of LlmTaskRunner
Removed allocation of LlmInference Options
Updated the response generation queue to be serial in iOS LlmInference
Updated documentation of iOS LlmInference, documentation of LlmInference+Session
Fixed marking of response generation completed control flow in LlmInference+Session.
LlmInference.Options: remove unnecessary numOfSupportedLoraRanks parameter.
Add activation data type to LlmInference.Options.
Added more methods to iOS AVAudioPCMBuffer+TestUtils, few basic iOS audio classifier tests, options tests to iOS audio classifier, utils for AVAudioFile, test for score threshold to MPPAudioClassifierTests, constants in MPPAudioClassifierTests, close method to iOS audio classifier, iOS MPPAudioData test utils, stream mode tests for iOS audio classifier, iOS audio classifier to cocoapods build, audio record creation tests to MPPAudioClassifierTests, close method to MPPAudioEmbedder, iOS audio embedder tests, more utility methods to MPPAudioEmbedderTests, streams mode tests for iOS audio embedder, iOS audio embedder to cocoapods build, comments to MPPAudioClassifierTests, iOS audio embedder header and implementation, iOS audio classifier implementation file, method for getting interleaved float32 pcm buffer from audio file, refactored iOS LlmTaskRunner, iOS LlmSessionRunner, more errors to GenAiInferenceError, refactored LlmInference, iOS session runner to build files, extra safeguards for response context in LlmSessionRunner, LlmInference+Session.swift and documentation regarding session and inference life times to iOS LLM Inference.
Fixed issue with iOS audio embedder result parsing, iOS audio embedder options processing , index error in AVAudioFile+TestUtils, audio classifier result processing in stream mode, error handling in MPPAudioData, microphone recording issues in iOS MPPAudioRecord, documentation of iOS Audio Record, iOS audio record and audio data tests by avoiding audio engine running state checks and iOS audio embedder result helpers and bug due to simultaneous response generation calls across sessions.
Updated method signatures in iOS audio classifier tests
Fixed flow limiting in iOS audio classifier
Removed duplicate test from MPPAudioClassifierTests
Updated comments in AVAudioFile+TestUtils
Changed the name of iOS audio classifier async test helper
Update comment for LlmInference.Session.clone() method.
Marked inits unavailable in MPPFloatBuffer
Updated documentation of iOS audio record
Adds a LlmInference.Metrics for providing some key performance metrics ( initialization time, response generation time) of the LLM inference.
Removed unwanted imports from iOS audio data tests
Cleaned ios audio test utils BUILD file
Remove the activation data type from the Swift API. We don't expect users to set it directly.
Use seconds instead of milliseconds for latency metrics.

Javascript

Add comments to generateResponses method.
Migrate to ForEachLine to have a single source of truth for getting file contents lines.
Workaround for multi-output web LLM issue where last response can get corrupted when numResponses is odd.
Quick fix for wrong number of multi-outputs sometimes when streaming

Python

Add a flag in the converter config for generating fake weights. When it is set to true, all weights will be filled with zeros.
Update text embedder test to match the output after XNNPack upgrade.
Update remaining data in text embedder test to match the output after XNNPack upgrade.
Update the expected value of the text embedder test.
Add python pip deps to WORKSPACE
Fix pip_deps targets.

Model Maker changes

Undo dynamic sequence length for export_model api because it doesn't work with MediaPipe.
Replace mock with unittest.mock in model_maker tests.
Move tensorflow lite python calls to ai-edge-litert.

MediaPipe Dependencies

Update WASM files

google-ai-edge/mediapipe v0.10.18 MediaPipe v0.10.18 on GitHub

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe Dependencies

google-ai-edge/mediapipe v0.10.18
MediaPipe v0.10.18

on GitHub