Unity-Technologies/ml-agents 0.10.0 on GitHub

New Features

Soft Actor-Critic (SAC) is added as a new trainer option, complementing Proximal Policy Optimization (PPO). SAC, an off-policy algorithm, is more sample-efficient (i.e., requires fewer environment steps). For environments that take a long time to execute a step (about >0.1 second or greater) this can lead to dramatic training speedups of around 3-5 times versus PPO. In addition to sample-efficiency, SAC has been shown to be robust to small variations in the environment and effective at exploring the environment to find optimal behaviors. See the SAC documentation for more details.
Example environments have been updated to a new dark-theme visual style and colors have been standardized across all environments.
Unity environment command line arguments can be passed through mlagents-learn. See the documentation on how to use this feature.

ML-Agents is now compatible with Python v3.7 and newer versions of Tensorflow up to 1.14.
Fixed an issue when using recurrent networks and agents are destroyed. (#2549)
Fixed a memory leak during inference. (#2541)
The UnitySDK.log is no longer logged out, which fixes an issue with 2019 versions of the Unity Editor (#2580).
The Academy class no longer has a Done() method. All Done calls should be handled in the Agent (#2519). See Migrating for more information.
C# code was updated to follow Unity coding conventions.
Fixed a crash that happens when enabling VAIL with a GAIL reward signal (#2598)
Other minor documentation enhancements and bug fixes.