This is our second alpha version which we hope to be the last before the full Gymnasium v1.0.0 release. We summarise the key changes, bug fixes and new features added in this alpha version.
Key Changes
Atari environments
ale-py that provides the Atari environments has been updated in v0.9.0 to use Gymnasium as the API backend. Furthermore, the pip install contains the ROMs so all that should be necessary for installing Atari will be pip install “gymnasium[atari]”
(as a result, gymnasium[accept-rom-license]
has been removed). A reminder that for Gymnasium v1.0 to register the external environments (e.g., ale-py
), you will be required to import ale_py
before creating any of the Atari environments.
Collecting seeding values
It was possible to seed with both environments and spaces with None
to use a random initial seed value however it wouldn’t be possible to know what these initial seed values were. We have addressed for this Space.seed
and reset.seed
in #1033 and #889. For Space.seed
, we have changed the return type to be specialised for each space such that the following code will work for all spaces.
seeded_values = space.seed(None)
initial_samples = [space.sample() for _ in range(10)]
reseed_values = space.seed(seeded_values)
reseed_samples = [space.sample() for _ in range(10)]
assert seeded_values == reseed_values
assert initial_samples == reseed_samples
Additionally, for environments, we have added a new np_random_seed
attribute that will store the most recent np_random
seed value from reset(seed=seed)
.
Environment Version changes
- It was discovered recently that the mujoco-based pusher was not compatible with MuJoCo
>= 3
due to bug fixes that found the model density for a block that the agent had to push was the density of air. This obviously began to cause issues for users with MuJoCo v3+ and Pusher. Therefore, we are disabled thev4
environment with MuJoCo>= 3
and updated to the model in MuJoCov5
that produces more expected behaviour likev4
and MuJoCo< 3
(#1019). - Alpha 2 includes new v5 MuJoCo environments as a follow-up to v4 environments added two years ago, fixing consistencies, adding new features and updating the documentation. We have decided to mark the MuJoCo-py (v2 and v3) environments as deprecated and plan to remove them from Gymnasium in future (#926).
- Lunar Lander version increased from v2 to v3 due to two bug fixes. The first fixes the determinism of the environment such that the world object was not completely destroyed on reset causing non-determinism in particular cases (#979). Second, the wind generation (by default turned off) was not randomly generated by each reset, therefore, we have updated this to gain statistical independence between episodes (#959).
Box Samples
It was discovered that the spaces.Box
would allow low and high values outside the dtype’s range (#774) which could result in some very strange edge cases that were very difficult to detect. We hope that these changes improve debugging and detecting invalid inputs to the space, however, let us know if your environment raises issues related to this.
Bug Fixes
- Updates
CartPoleVectorEnv
for the new autoreset API (#915) - Fixed
wrappers.vector.RecordEpisodeStatistics
episode length computation from new autoreset api (#1018) - Remove
mujoco-py
import error for v4+ MuJoCo environments (#934) - Fix
make_vec(**kwargs)
not being passed to vector entry point envs (#952) - Fix reading shared memory for
Tuple
andDict
spaces (#941) - Fix
Multidiscrete.from_jsonable
for windows (#932) - Remove
play
rendering normalisation (#956)
New Features
- Added Python 3.12 support
- Add a new
OneOf
space that provides exclusive unions of spaces (#812) - Update
Dict.sample
to use standard Python dicts rather thanOrderedDict
due to dropping Python 3.7 support (#977) - Jax environment return jax data rather than numpy data (#817)
- Add
wrappers.vector.HumanRendering
and remove human rendering fromCartPoleVectorEnv
(#1013) - Add more helpful error messages if users use a mixture of Gym and Gymnasium (#957)
- Add
sutton_barto_reward
argument forCartPole
that changes the reward function to not return 1 on terminating states (#958) - Add
visual_options
rendering argument for MuJoCo environments (#965) - Add
exact
argument toutlis.env_checker.data_equivilance
(#924) - Update
wrapper.NormalizeObservation
observation space and change observation tofloat32
(#978) - Catch exception during
env.spec
if kwarg is unpickleable (#982) - Improving ImportError for Box2D (#1009)
- Added metadata field to VectorEnv and VectorWrapper (#1006)
- Fix
make_vec
for sync or async when modifying make arguments (#1027)
Full Changelog: v1.0.0a1...v1.0.0a2 v0.29.1...v1.0.0a2