Acrobot
Lerax port of Gymnasium's Acrobot environment. Two rigid links are connected in series, with the first link pinned to the base and only the joint between the two links actuated. The agent applies torque at the middle joint to swing the lower tip up above a target height.
Observation space
6-dim float vector [cos(theta1), sin(theta1), cos(theta2), sin(theta2), theta1_dot, theta2_dot]. Velocities are clipped to ±4π and ±9π respectively.
Action space
Discrete(3) selecting a torque from torques = [-1.0, 0.0, 1.0] (applied to the joint between the two links).
Reward
-1.0 on every non-terminal step, 0.0 on the terminal step. Computed as done.astype(float) - 1.0.
Termination
Terminates when the tip of the second link rises above 1.0, i.e. -cos(theta1) - cos(theta1 + theta2) > 1.0. No built-in truncation.
Deviations from Gymnasium
Dynamics are integrated with Diffrax (default Tsit5); pass solver=diffrax.Euler() for Gymnasium-identical dynamics. No built-in 500-step truncation.
lerax.env.classic_control.Acrobot
Bases: AbstractClassicControlEnv[AcrobotState, Int[Array, ''], Float[Array, '4']]
Acrobot environment matching the Gymnasium Acrobot environment.
Note
To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().
Action Space
The action space is discrete with three actions:
- 0: Apply -1 torque to the joint between the two links
- 1: Apply 0 torque
- 2: Apply +1 torque to the joint between the two links
The action applies a fixed magnitude torque to the joint for the duration of the time step.
Observation Space
The observation space is a 6-dimensional continuous space representing the state of the two links:
| Index | Observation | Min Value | Max Value |
|---|---|---|---|
| 0 | Cosine of Joint Angle 1 | -1.0 | 1.0 |
| 1 | Sine of Joint Angle 1 | -1.0 | 1.0 |
| 2 | Cosine of Joint Angle 2 | -1.0 | 1.0 |
| 3 | Sine of Joint Angle 2 | -1.0 | 1.0 |
| 4 | Joint Velocity 1 | -4π | 4π |
| 5 | Joint Velocity 2 | -9π | 9π |
Reward
Non-terminal steps yield a reward of -1.0 and the terminal step yields a reward of 0.0.
Termination
The episode terminates when the tip of the second link reaches a height greater than 1.0 unit above the base. This corresponds to the condition: -cos(theta1) - cos(theta1 + theta2) > 1.0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gravity
|
Float[ArrayLike, '']
|
Gravitational acceleration. |
9.8
|
link_length_1
|
Float[ArrayLike, '']
|
Length of the first link. |
1.0
|
link_length_2
|
Float[ArrayLike, '']
|
Length of the second link. |
1.0
|
link_mass_1
|
Float[ArrayLike, '']
|
Mass of the first link. |
1.0
|
link_mass_2
|
Float[ArrayLike, '']
|
Mass of the second link. |
1.0
|
link_com_pos_1
|
Float[ArrayLike, '']
|
Center of mass position of the first link. |
0.5
|
link_com_pos_2
|
Float[ArrayLike, '']
|
Center of mass position of the second link. |
0.5
|
link_moi
|
Float[ArrayLike, '']
|
Moment of inertia of the links. |
1.0
|
max_vel_1
|
Float[ArrayLike, '']
|
Maximum angular velocity for the first joint. |
4 * jnp.pi
|
max_vel_2
|
Float[ArrayLike, '']
|
Maximum angular velocity for the second joint. |
9 * jnp.pi
|
torque_max_noise
|
Float[ArrayLike, '']
|
Maximum noise to add to the applied torque. |
0.0
|
torques
|
Float[ArrayLike, '3']
|
Array of possible torques corresponding to each action. |
jnp.array([-1.0, 0.0, 1.0])
|
dt
|
Float[ArrayLike, '']
|
Time step for integration. |
0.2
|
solver
|
diffrax.AbstractSolver | None
|
Diffrax solver to use for integration. |
None
|
stepsize_controller
|
diffrax.AbstractStepSizeController | None
|
Step size controller for adaptive solvers. |
None
|
render_states
render_states(
states: Sequence[StateType],
renderer: AbstractRenderer | Literal["auto"] = "auto",
dt: float = 0.0,
)
Render a sequence of frames from multiple states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
states
|
Sequence[StateType]
|
A sequence of environment states to render. |
required |
renderer
|
AbstractRenderer | Literal['auto']
|
The renderer to use for rendering. If "auto", uses the default renderer. |
'auto'
|
dt
|
float
|
The time delay between rendering each frame, in seconds. |
0.0
|
render_stacked
render_stacked(
states: StateType,
renderer: AbstractRenderer | Literal["auto"] = "auto",
dt: float = 0.0,
)
Render multiple frames from stacked states.
Stacked states are typically batched states stored in a pytree structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
states
|
StateType
|
A pytree of stacked environment states to render. |
required |
renderer
|
AbstractRenderer | Literal['auto']
|
The renderer to use for rendering. If "auto", uses the default renderer. |
'auto'
|
dt
|
float
|
The time delay between rendering each frame, in seconds. |
0.0
|
reset
Wrap the functional logic into a Gym API reset method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
Key[Array, '']
|
A JAX PRNG key for any stochasticity in the reset. |
required |
Returns:
| Type | Description |
|---|---|
tuple[StateType, ObsType, dict]
|
A tuple of the initial state, initial observation, and additional info. |
step
step(
state: StateType,
action: ActType,
*,
key: Key[Array, ""],
) -> tuple[
StateType,
ObsType,
Float[Array, ""],
Bool[Array, ""],
Bool[Array, ""],
dict,
]
Wrap the functional logic into a Gym API step method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
StateType
|
The current environment state. |
required |
action
|
ActType
|
The action to take. |
required |
key
|
Key[Array, '']
|
A JAX PRNG key for any stochasticity in the step. |
required |
Returns:
| Type | Description |
|---|---|
tuple[StateType, ObsType, Float[Array, ''], Bool[Array, ''], Bool[Array, ''], dict]
|
A tuple of the next state, observation, reward, terminal flag, truncate flag, and additional info. |