Acrobot

Lerax port of Gymnasium's Acrobot environment. Two rigid links are connected in series, with the first link pinned to the base and only the joint between the two links actuated. The agent applies torque at the middle joint to swing the lower tip up above a target height.

Observation space

6-dim float vector [cos(theta1), sin(theta1), cos(theta2), sin(theta2), theta1_dot, theta2_dot]. Velocities are clipped to ±4π and ±9π respectively.

Action space

Discrete(3) selecting a torque from torques = [-1.0, 0.0, 1.0] (applied to the joint between the two links).

Reward

-1.0 on every non-terminal step, 0.0 on the terminal step. Computed as done.astype(float) - 1.0.

Termination

Terminates when the tip of the second link rises above 1.0, i.e. -cos(theta1) - cos(theta1 + theta2) > 1.0. No built-in truncation.

Deviations from Gymnasium

Dynamics are integrated with Diffrax (default Tsit5); pass solver=diffrax.Euler() for Gymnasium-identical dynamics. No built-in 500-step truncation.

lerax.env.classic_control.Acrobot

Bases: AbstractClassicControlEnv[AcrobotState, Int[Array, ''], Float[Array, '4']]

Acrobot environment matching the Gymnasium Acrobot environment.

Note

To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().

Action Space

The action space is discrete with three actions:

0: Apply -1 torque to the joint between the two links
1: Apply 0 torque
2: Apply +1 torque to the joint between the two links

The action applies a fixed magnitude torque to the joint for the duration of the time step.

Observation Space

The observation space is a 6-dimensional continuous space representing the state of the two links:

Index	Observation	Min Value	Max Value
0	Cosine of Joint Angle 1	-1.0	1.0
1	Sine of Joint Angle 1	-1.0	1.0
2	Cosine of Joint Angle 2	-1.0	1.0
3	Sine of Joint Angle 2	-1.0	1.0
4	Joint Velocity 1	-4π	4π
5	Joint Velocity 2	-9π	9π

Reward

Non-terminal steps yield a reward of -1.0 and the terminal step yields a reward of 0.0.

Termination

The episode terminates when the tip of the second link reaches a height greater than 1.0 unit above the base. This corresponds to the condition: -cos(theta1) - cos(theta1 + theta2) > 1.0

Parameters:

Name	Type	Description	Default
`gravity`	`Float[ArrayLike, '']`	Gravitational acceleration.	`9.8`
`link_length_1`	`Float[ArrayLike, '']`	Length of the first link.	`1.0`
`link_length_2`	`Float[ArrayLike, '']`	Length of the second link.	`1.0`
`link_mass_1`	`Float[ArrayLike, '']`	Mass of the first link.	`1.0`
`link_mass_2`	`Float[ArrayLike, '']`	Mass of the second link.	`1.0`
`link_com_pos_1`	`Float[ArrayLike, '']`	Center of mass position of the first link.	`0.5`
`link_com_pos_2`	`Float[ArrayLike, '']`	Center of mass position of the second link.	`0.5`
`link_moi`	`Float[ArrayLike, '']`	Moment of inertia of the links.	`1.0`
`max_vel_1`	`Float[ArrayLike, '']`	Maximum angular velocity for the first joint.	`4 * jnp.pi`
`max_vel_2`	`Float[ArrayLike, '']`	Maximum angular velocity for the second joint.	`9 * jnp.pi`
`torque_max_noise`	`Float[ArrayLike, '']`	Maximum noise to add to the applied torque.	`0.0`
`torques`	`Float[ArrayLike, '3']`	Array of possible torques corresponding to each action.	`jnp.array([-1.0, 0.0, 1.0])`
`dt`	`Float[ArrayLike, '']`	Time step for integration.	`0.2`
`solver`	`diffrax.AbstractSolver \| None`	Diffrax solver to use for integration.	`None`
`stepsize_controller`	`diffrax.AbstractStepSizeController \| None`	Step size controller for adaptive solvers.	`None`

unwrapped `property`

unwrapped: Self

Return the unwrapped environment

action_mask

action_mask(
    state: StateType, *, key: Key[Array, ""]
) -> None

transition

transition(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> StateType

truncate

truncate(state: StateType) -> Bool[Array, '']

state_info

state_info(state: StateType) -> dict

transition_info

transition_info(
    state: StateType, action: ActType, next_state: StateType
) -> dict

render_states

render_states(
    states: Sequence[StateType],
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render a sequence of frames from multiple states.

Parameters:

Name	Type	Description	Default
`states`	`Sequence[StateType]`	A sequence of environment states to render.	required
`renderer`	`AbstractRenderer \| Literal['auto']`	The renderer to use for rendering. If "auto", uses the default renderer.	`'auto'`
`dt`	`float`	The time delay between rendering each frame, in seconds.	`0.0`

render_stacked

render_stacked(
    states: StateType,
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render multiple frames from stacked states.

Stacked states are typically batched states stored in a pytree structure.

Parameters:

Name	Type	Description	Default
`states`	`StateType`	A pytree of stacked environment states to render.	required
`renderer`	`AbstractRenderer \| Literal['auto']`	The renderer to use for rendering. If "auto", uses the default renderer.	`'auto'`
`dt`	`float`	The time delay between rendering each frame, in seconds.	`0.0`

reset

reset(
    *, key: Key[Array, ""]
) -> tuple[StateType, ObsType, dict]

Wrap the functional logic into a Gym API reset method.

Parameters:

Name	Type	Description	Default
`key`	`Key[Array, '']`	A JAX PRNG key for any stochasticity in the reset.	required

Returns:

Type	Description
`tuple[StateType, ObsType, dict]`	A tuple of the initial state, initial observation, and additional info.

step

step(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> tuple[
    StateType,
    ObsType,
    Float[Array, ""],
    Bool[Array, ""],
    Bool[Array, ""],
    dict,
]

Wrap the functional logic into a Gym API step method.

Parameters:

Name	Type	Description	Default
`state`	`StateType`	The current environment state.	required
`action`	`ActType`	The action to take.	required
`key`	`Key[Array, '']`	A JAX PRNG key for any stochasticity in the step.	required

Returns:

Type	Description
`tuple[StateType, ObsType, Float[Array, ''], Bool[Array, ''], Bool[Array, ''], dict]`	A tuple of the next state, observation, reward, terminal flag, truncate flag, and additional info.

Acrobot

Observation space

Action space

Reward

Termination

Deviations from Gymnasium

lerax.env.classic_control.Acrobot

Action Space

Observation Space

Reward

Termination

unwrapped property

action_mask

transition

truncate

state_info

transition_info

render_states

render_stacked

reset

step

unwrapped `property`