Skip to content

Acrobot

Lerax port of Gymnasium's Acrobot environment. Two rigid links are connected in series, with the first link pinned to the base and only the joint between the two links actuated. The agent applies torque at the middle joint to swing the lower tip up above a target height.

Observation space

6-dim float vector [cos(theta1), sin(theta1), cos(theta2), sin(theta2), theta1_dot, theta2_dot]. Velocities are clipped to ±4π and ±9π respectively.

Action space

Discrete(3) selecting a torque from torques = [-1.0, 0.0, 1.0] (applied to the joint between the two links).

Reward

-1.0 on every non-terminal step, 0.0 on the terminal step. Computed as done.astype(float) - 1.0.

Termination

Terminates when the tip of the second link rises above 1.0, i.e. -cos(theta1) - cos(theta1 + theta2) > 1.0. No built-in truncation.

Deviations from Gymnasium

Dynamics are integrated with Diffrax (default Tsit5); pass solver=diffrax.Euler() for Gymnasium-identical dynamics. No built-in 500-step truncation.

lerax.env.classic_control.Acrobot

Bases: AbstractClassicControlEnv[AcrobotState, Int[Array, ''], Float[Array, '4']]

Acrobot environment matching the Gymnasium Acrobot environment.

Note

To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().

Action Space

The action space is discrete with three actions:

  • 0: Apply -1 torque to the joint between the two links
  • 1: Apply 0 torque
  • 2: Apply +1 torque to the joint between the two links

The action applies a fixed magnitude torque to the joint for the duration of the time step.

Observation Space

The observation space is a 6-dimensional continuous space representing the state of the two links:

Index Observation Min Value Max Value
0 Cosine of Joint Angle 1 -1.0 1.0
1 Sine of Joint Angle 1 -1.0 1.0
2 Cosine of Joint Angle 2 -1.0 1.0
3 Sine of Joint Angle 2 -1.0 1.0
4 Joint Velocity 1 -4π
5 Joint Velocity 2 -9π

Reward

Non-terminal steps yield a reward of -1.0 and the terminal step yields a reward of 0.0.

Termination

The episode terminates when the tip of the second link reaches a height greater than 1.0 unit above the base. This corresponds to the condition: -cos(theta1) - cos(theta1 + theta2) > 1.0

Parameters:

Name Type Description Default
gravity Float[ArrayLike, '']

Gravitational acceleration.

9.8
link_length_1 Float[ArrayLike, '']

Length of the first link.

1.0
link_length_2 Float[ArrayLike, '']

Length of the second link.

1.0
link_mass_1 Float[ArrayLike, '']

Mass of the first link.

1.0
link_mass_2 Float[ArrayLike, '']

Mass of the second link.

1.0
link_com_pos_1 Float[ArrayLike, '']

Center of mass position of the first link.

0.5
link_com_pos_2 Float[ArrayLike, '']

Center of mass position of the second link.

0.5
link_moi Float[ArrayLike, '']

Moment of inertia of the links.

1.0
max_vel_1 Float[ArrayLike, '']

Maximum angular velocity for the first joint.

4 * jnp.pi
max_vel_2 Float[ArrayLike, '']

Maximum angular velocity for the second joint.

9 * jnp.pi
torque_max_noise Float[ArrayLike, '']

Maximum noise to add to the applied torque.

0.0
torques Float[ArrayLike, '3']

Array of possible torques corresponding to each action.

jnp.array([-1.0, 0.0, 1.0])
dt Float[ArrayLike, '']

Time step for integration.

0.2
solver diffrax.AbstractSolver | None

Diffrax solver to use for integration.

None
stepsize_controller diffrax.AbstractStepSizeController | None

Step size controller for adaptive solvers.

None

unwrapped property

unwrapped: Self

Return the unwrapped environment

action_mask

action_mask(
    state: StateType, *, key: Key[Array, ""]
) -> None

transition

transition(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> StateType

truncate

truncate(state: StateType) -> Bool[Array, '']

state_info

state_info(state: StateType) -> dict

transition_info

transition_info(
    state: StateType, action: ActType, next_state: StateType
) -> dict

render_states

render_states(
    states: Sequence[StateType],
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render a sequence of frames from multiple states.

Parameters:

Name Type Description Default
states Sequence[StateType]

A sequence of environment states to render.

required
renderer AbstractRenderer | Literal['auto']

The renderer to use for rendering. If "auto", uses the default renderer.

'auto'
dt float

The time delay between rendering each frame, in seconds.

0.0

render_stacked

render_stacked(
    states: StateType,
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render multiple frames from stacked states.

Stacked states are typically batched states stored in a pytree structure.

Parameters:

Name Type Description Default
states StateType

A pytree of stacked environment states to render.

required
renderer AbstractRenderer | Literal['auto']

The renderer to use for rendering. If "auto", uses the default renderer.

'auto'
dt float

The time delay between rendering each frame, in seconds.

0.0

reset

reset(
    *, key: Key[Array, ""]
) -> tuple[StateType, ObsType, dict]

Wrap the functional logic into a Gym API reset method.

Parameters:

Name Type Description Default
key Key[Array, '']

A JAX PRNG key for any stochasticity in the reset.

required

Returns:

Type Description
tuple[StateType, ObsType, dict]

A tuple of the initial state, initial observation, and additional info.

step

step(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> tuple[
    StateType,
    ObsType,
    Float[Array, ""],
    Bool[Array, ""],
    Bool[Array, ""],
    dict,
]

Wrap the functional logic into a Gym API step method.

Parameters:

Name Type Description Default
state StateType

The current environment state.

required
action ActType

The action to take.

required
key Key[Array, '']

A JAX PRNG key for any stochasticity in the step.

required

Returns:

Type Description
tuple[StateType, ObsType, Float[Array, ''], Bool[Array, ''], Bool[Array, ''], dict]

A tuple of the next state, observation, reward, terminal flag, truncate flag, and additional info.