Skip to content

Mountain Car

Lerax port of Gymnasium's Mountain Car environment. An underpowered car sits in a 1D valley and must reach a flag at the top of the right hill. The car cannot climb directly and must build momentum by swinging back and forth in the valley.

Observation space

2-dim float vector [position, velocity]. Position is clipped to [-1.2, 0.6] and velocity to [-0.07, 0.07].

Action space

Discrete(3): 0 push left, 1 no push, 2 push right. Force applied is (action - 1) * force with force = 0.001 by default.

Reward

-1.0 on every step regardless of action or outcome.

Termination

Terminates when position >= goal_position (default 0.5) and velocity >= goal_velocity (default 0.0). No built-in truncation.

Deviations from Gymnasium

Dynamics are integrated with Diffrax (default Tsit5); pass solver=diffrax.Euler() for Gymnasium-identical dynamics. No built-in 200-step truncation.

lerax.env.classic_control.MountainCar

Bases: AbstractClassicControlEnv[MountainCarState, Int[Array, ''], Float[Array, '2']]

Mountain Car environment matching the Gymnasium MountainCar environment.

Note

To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().

Action Space

The action space is discrete with 3 actions:

  • 0: Push left
  • 1: No push
  • 2: Push right

The action applies a fixed magnitude force to the car in the specified direction for the duration of the time step.

Observation Space

The observation space is a 2-dimensional continuous space representing the position and velocity of the car:

Index Observation Min Value Max Value
0 Car Position -1.2 0.6
1 Car Velocity -0.07 0.07

These values reflect the physical limits of the environment. These limits can be modified via the min_position, max_position, and max_speed parameters.

Reward

The reward is -1 for every step taken until the goal position is reached.

Termination

The episode terminates when the car reaches the goal position of 0.5 with a velocity of at least 0.0.

Parameters:

Name Type Description Default
min_position Float[ArrayLike, '']

Minimum position of the car.

-1.2
max_position Float[ArrayLike, '']

Maximum position of the car.

0.6
max_speed Float[ArrayLike, '']

Maximum speed of the car.

0.07
goal_position Float[ArrayLike, '']

Position at which the episode terminates.

0.5
goal_velocity Float[ArrayLike, '']

Minimum velocity at the goal position to terminate the episode.

0.0
force Float[ArrayLike, '']

Magnitude of the force applied when pushing the car.

0.001
gravity Float[ArrayLike, '']

Gravity constant affecting the car's movement.

0.0025
dt Float[ArrayLike, '']

Time step for the environment dynamics.

1.0
solver diffrax.AbstractSolver | None

Diffrax solver to use for integrating the dynamics.

None
stepsize_controller diffrax.AbstractStepSizeController | None

Step size controller for adaptive solvers.

None

unwrapped property

unwrapped: Self

Return the unwrapped environment

action_mask

action_mask(
    state: StateType, *, key: Key[Array, ""]
) -> None

transition

transition(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> StateType

truncate

truncate(state: StateType) -> Bool[Array, '']

state_info

state_info(state: StateType) -> dict

transition_info

transition_info(
    state: StateType, action: ActType, next_state: StateType
) -> dict

render_states

render_states(
    states: Sequence[StateType],
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render a sequence of frames from multiple states.

Parameters:

Name Type Description Default
states Sequence[StateType]

A sequence of environment states to render.

required
renderer AbstractRenderer | Literal['auto']

The renderer to use for rendering. If "auto", uses the default renderer.

'auto'
dt float

The time delay between rendering each frame, in seconds.

0.0

render_stacked

render_stacked(
    states: StateType,
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render multiple frames from stacked states.

Stacked states are typically batched states stored in a pytree structure.

Parameters:

Name Type Description Default
states StateType

A pytree of stacked environment states to render.

required
renderer AbstractRenderer | Literal['auto']

The renderer to use for rendering. If "auto", uses the default renderer.

'auto'
dt float

The time delay between rendering each frame, in seconds.

0.0

reset

reset(
    *, key: Key[Array, ""]
) -> tuple[StateType, ObsType, dict]

Wrap the functional logic into a Gym API reset method.

Parameters:

Name Type Description Default
key Key[Array, '']

A JAX PRNG key for any stochasticity in the reset.

required

Returns:

Type Description
tuple[StateType, ObsType, dict]

A tuple of the initial state, initial observation, and additional info.

step

step(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> tuple[
    StateType,
    ObsType,
    Float[Array, ""],
    Bool[Array, ""],
    Bool[Array, ""],
    dict,
]

Wrap the functional logic into a Gym API step method.

Parameters:

Name Type Description Default
state StateType

The current environment state.

required
action ActType

The action to take.

required
key Key[Array, '']

A JAX PRNG key for any stochasticity in the step.

required

Returns:

Type Description
tuple[StateType, ObsType, Float[Array, ''], Bool[Array, ''], Bool[Array, ''], dict]

A tuple of the next state, observation, reward, terminal flag, truncate flag, and additional info.