Acrobot

lerax.env.classic_control.Acrobot

Bases: AbstractClassicControlEnv[AcrobotState, Int[Array, ''], Float[Array, '4']]

Acrobot environment matching the Gymnasium Acrobot environment.

Note

To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().

Action Space

The action space is discrete with three actions:

0: Apply -1 torque to the joint between the two links
1: Apply 0 torque
2: Apply +1 torque to the joint between the two links

The action applies a fixed magnitude torque to the joint for the duration of the time step.

Observation Space

The observation space is a 6-dimensional continuous space representing the state of the two links:

Index	Observation	Min Value	Max Value
0	Cosine of Joint Angle 1	-1.0	1.0
1	Sine of Joint Angle 1	-1.0	1.0
2	Cosine of Joint Angle 2	-1.0	1.0
3	Sine of Joint Angle 2	-1.0	1.0
4	Joint Velocity 1	-4π	4π
5	Joint Velocity 2	-9π	9π

Reward

Non-terminal steps yield a reward of -1.0 and the terminal step yields a reward of 0.0.

Termination

The episode terminates when the tip of the second link reaches a height greater than 1.0 unit above the base. This corresponds to the condition: -cos(theta1) - cos(theta1 + theta2) > 1.0

Parameters:

Name	Type	Description	Default
`gravity`	`Float[ArrayLike, '']`	Gravitational acceleration.	`9.8`
`link_length_1`	`Float[ArrayLike, '']`	Length of the first link.	`1.0`
`link_length_2`	`Float[ArrayLike, '']`	Length of the second link.	`1.0`
`link_mass_1`	`Float[ArrayLike, '']`	Mass of the first link.	`1.0`
`link_mass_2`	`Float[ArrayLike, '']`	Mass of the second link.	`1.0`
`link_com_pos_1`	`Float[ArrayLike, '']`	Center of mass position of the first link.	`0.5`
`link_com_pos_2`	`Float[ArrayLike, '']`	Center of mass position of the second link.	`0.5`
`link_moi`	`Float[ArrayLike, '']`	Moment of inertia of the links.	`1.0`
`max_vel_1`	`Float[ArrayLike, '']`	Maximum angular velocity for the first joint.	`4 * jnp.pi`
`max_vel_2`	`Float[ArrayLike, '']`	Maximum angular velocity for the second joint.	`9 * jnp.pi`
`torque_max_noise`	`Float[ArrayLike, '']`	Maximum noise to add to the applied torque.	`0.0`
`torques`	`Float[ArrayLike, '3']`	Array of possible torques corresponding to each action.	`jnp.array([-1.0, 0.0, 1.0])`
`dt`	`Float[ArrayLike, '']`	Time step for integration.	`0.2`
`solver`	`diffrax.AbstractSolver \| None`	Diffrax solver to use for integration.	`None`
`stepsize_controller`	`diffrax.AbstractStepSizeController \| None`	Step size controller for adaptive solvers.	`None`

unwrapped `property`

unwrapped: Self

Return the unwrapped environment

action_mask

action_mask(
    state: StateType, *, key: Key[Array, ""]
) -> None

transition

transition(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> StateType

truncate

truncate(state: StateType) -> Bool[Array, '']

state_info

state_info(state: StateType) -> dict

transition_info

transition_info(
    state: StateType, action: ActType, next_state: StateType
) -> dict

render_states

render_states(
    states: Sequence[StateType],
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render a sequence of frames from multiple states.

Parameters:

Name	Type	Description	Default
`states`	`Sequence[StateType]`	A sequence of environment states to render.	required
`renderer`	`AbstractRenderer \| Literal['auto']`	The renderer to use for rendering. If "auto", uses the default renderer.	`'auto'`
`dt`	`float`	The time delay between rendering each frame, in seconds.	`0.0`

render_stacked

render_stacked(
    states: StateType,
    renderer: AbstractRenderer | Literal["auto"] = "auto",
    dt: float = 0.0,
)

Render multiple frames from stacked states.

Stacked states are typically batched states stored in a pytree structure.

Parameters:

Name	Type	Description	Default
`states`	`StateType`	A pytree of stacked environment states to render.	required
`renderer`	`AbstractRenderer \| Literal['auto']`	The renderer to use for rendering. If "auto", uses the default renderer.	`'auto'`
`dt`	`float`	The time delay between rendering each frame, in seconds.	`0.0`

reset

reset(
    *, key: Key[Array, ""]
) -> tuple[StateType, ObsType, dict]

Wrap the functional logic into a Gym API reset method.

Parameters:

Name	Type	Description	Default
`key`	`Key[Array, '']`	A JAX PRNG key for any stochasticity in the reset.	required

Returns:

Type	Description
`tuple[StateType, ObsType, dict]`	A tuple of the initial state, initial observation, and additional info.

step

step(
    state: StateType,
    action: ActType,
    *,
    key: Key[Array, ""],
) -> tuple[
    StateType,
    ObsType,
    Float[Array, ""],
    Bool[Array, ""],
    Bool[Array, ""],
    dict,
]

Wrap the functional logic into a Gym API step method.

Parameters:

Name	Type	Description	Default
`state`	`StateType`	The current environment state.	required
`action`	`ActType`	The action to take.	required
`key`	`Key[Array, '']`	A JAX PRNG key for any stochasticity in the step.	required

Returns:

Type	Description
`tuple[StateType, ObsType, Float[Array, ''], Bool[Array, ''], Bool[Array, ''], dict]`	A tuple of the next state, observation, reward, terminal flag, truncate flag, and additional info.

Acrobot

lerax.env.classic_control.Acrobot

Action Space

Observation Space

Reward

Termination

unwrapped property

action_mask

transition

truncate

state_info

transition_info

render_states

render_stacked

reset

step

unwrapped `property`