- Adds Soft Actor-Critic (SAC) Trainer (supporting Dictionary Observations and Actions)
- Simplifies the reward aggregation interface (now also supports multi-agent training)
- Extends PPO and A2C to multi-agent capable actor-critic trainers (individual agents vs. centralized critic)
- Adds option for custom rollout evaluators
- Adds option for shared weights in actor-critic settings
- Adds experiment and multi-run support for RunContext Python API
- Compatibility with PyTorch 1.9