Skip to content

Acrobot

lerax.env.Acrobot

Bases: AbstractClassicControlEnv[AcrobotState, Int[Array, ''], Float[Array, '4']]

Acrobot environment matching the Gymnasium Acrobot environment.

Note

To achieve identical dynamics to Gymnasium set solver=diffrax.Euler().

Action Space

The action space is discrete with three actions:

  • 0: Apply -1 torque to the joint between the two links
  • 1: Apply 0 torque
  • 2: Apply +1 torque to the joint between the two links

The action applies a fixed magnitude torque to the joint for the duration of the time step.

Observation Space

The observation space is a 6-dimensional continuous space representing the state of the two links:

Index Observation Min Value Max Value
0 Cosine of Joint Angle 1 -1.0 1.0
1 Sine of Joint Angle 1 -1.0 1.0
2 Cosine of Joint Angle 2 -1.0 1.0
3 Sine of Joint Angle 2 -1.0 1.0
4 Joint Velocity 1 -4π
5 Joint Velocity 2 -9π

Reward

Non-terminal steps yield a reward of -1.0 and the terminal step yields a reward of 0.0.

Termination

The episode terminates when the tip of the second link reaches a height greater than 1.0 unit above the base. This corresponds to the condition: -cos(theta1) - cos(theta1 + theta2) > 1.0

Parameters:

Name Type Description Default
gravity Float[ArrayLike, '']

Gravitational acceleration.

9.8
link_length_1 Float[ArrayLike, '']

Length of the first link.

1.0
link_length_2 Float[ArrayLike, '']

Length of the second link.

1.0
link_mass_1 Float[ArrayLike, '']

Mass of the first link.

1.0
link_mass_2 Float[ArrayLike, '']

Mass of the second link.

1.0
link_com_pos_1 Float[ArrayLike, '']

Center of mass position of the first link.

0.5
link_com_pos_2 Float[ArrayLike, '']

Center of mass position of the second link.

0.5
link_moi Float[ArrayLike, '']

Moment of inertia of the links.

1.0
max_vel_1 Float[ArrayLike, '']

Maximum angular velocity for the first joint.

4 * jnp.pi
max_vel_2 Float[ArrayLike, '']

Maximum angular velocity for the second joint.

9 * jnp.pi
torque_max_noise Float[ArrayLike, '']

Maximum noise to add to the applied torque.

0.0
torques Float[ArrayLike, '3']

Array of possible torques corresponding to each action.

jnp.array([-1.0, 0.0, 1.0])
dt Float[ArrayLike, '']

Time step for integration.

0.2
solver diffrax.AbstractSolver | None

Diffrax solver to use for integration.

None
stepsize_controller diffrax.AbstractStepSizeController | None

Step size controller for adaptive solvers.

None