[Documentation] [TitleIndex] [WordIndex

This package provides some reinforcement learning (RL) agents.


Please take a look at the tutorial on how to install, compile, and use this package.

Check out the code at: https://github.com/toddhester/rl-texplore-ros-pkg

This package includes a number of reinforcement learning agents that can be used for learning on robots, or learning with the environments in the accompanying rl_env package.

The package contains the following agents:

In addition to these methods, the package contains a general model-based architecture that can be used with any combinations of planners and model learning algorithms. For example, the R-Max implementation is simply the general agent with an R-Max model and Value Iteration for planning and the TEXPLORE agent is the general agent with a random forest model (Breiman 2001) and UCT (Kocsis and Szepesvari 2006) for planning.

Running the agent

The agent can be run with the following command. It should be initalized before starting the environment:

rosrun rl_agent agent --agent type [options]

where the agent type is one of the following:

qlearner sarsa modelbased rmax texplore dyna savedpolicy

There are a number of options to specify particular parameters of the algorithms:


For example, to run real-time TEXPLORE using 10 continuous trees, at an action rate of 25 Hz, with a discount factor of 0.99, you would call:

rosrun rl_agent agent --agent texplore --planner parallel-uct --nmodels 10 --model m5tree --actrate 25 --gamma 0.99

The General Model-Based Agent

Included in this package is a general model based agent that can use any model learning or planning method that match the interface defined by the core.hh file in the rl_common package.

The model learning methods that are available include:

With any of these types of models, multiple models can be combined together using the --nmodels option. For example, a random forest with 10 trees can be created with the options:

--model tree --nmodels 10

There are also a number of planning methods available:

Any of these model learning methods can be combined with any of the planners. It is also easy to write new model learning and planning methods that match the interface defined in rl_common and use those as well. In addition, there are multiple ways of performing exploration:

How the RL Agent interacts with the Environment

The RL agent can interact with the environment in two ways: it can use the ROS messages defined in the rl_msgs package, or another method can call the agent and environment methods directly, as done in the rl_experiment package.

Using rl_msgs

The rl_msgs package defines a set of ROS messages for the agent and environment to communicate. These are similar to the messages used in RL-Glue (Tanner and White 2009), but simplified and defined in the ROS message format. The environment publishes three types of messages for the agent:

The environment subscribes to one type of message from the agent:

When the environment is created, it sends an RLEnvDescription message to the agent. Then it will send any experience seeds for the agent in a series of RLEnvSeedExperience messages. Then it will send the agent an RLStateReward message with the agent's initial state in the domain. It should then receive an RLAction message, which it can apply to the domain and send a new RLStateReward message. When the episode has ended, the environment will receive an RLExperimentInfo message from the agent, and it will reset the domain and send the agent a new RLStateReward message with its initial state in the new episode.

Calling methods directly

Experiments can also be run by calling the agent methods directly (as done in the rl_experiment package). The methods that all Agents must implement are defined in the Agent interface in the rl_common package (API). Seeds can be given to the method by calling the seedExp method. The agent can be queried for an action after getting a new state and reward by calling next_action(reward, state).

Running the various algorithms

In this section, I provide directions on running each of the various algorithms available in the package, as well as what options each of the algorithms have. The package contains 6 algorithms:

Running Q-Learning

To run the basic Q-Learning (Watkins 1989) agent, type the following:

rosrun rl_agent agent --agent qlearner

By default, Q-Learning will be run with greedy exploration, a learning rate alpha of 0.3, and initial Q-values of 0.0.

The following options are available for the Q-Learning agent:

Running Sarsa

To run the basic Sarsa (Rummery and Niranjan 1994) agent, type the following:

rosrun rl_agent agent --agent sarsa

By default, Sarsa will be run with greedy exploration, a learning rate alpha of 0.3, initial action-values of 0.0, and lambda set to 0.1.

The following options are available for the Sarsa agent:

Running Dyna

To run the basic Dyna (Sutton 1990) agent, type the following:

rosrun rl_agent agent --agent dyna

By default, Dyna will be run with greedy exploration, a learning rate alpha of 0.3, initial action-values of 0.0, and k set to 1000.

The following options are available for the Dyna agent:

Running R-Max

To run the basic R-Max (Brafman and Tennenholtz 2001) agent, type the following:

rosrun rl_agent agent --agent rmax

R-Max uses a tabular model and gives exploration bonuses to any state-actions with fewer than M visits. By default, M is set to 5, and R-Max uses value iteration for planning.

The following options are available for the R-Max agent:


To run the basic TEXPLORE or TEXPLORE-VANIR (Hester and Stone 2010, Hester and Stone 2012, Hester et al 2012) agent, type the following:

rosrun rl_agent agent --agent texplore

TEXPLORE plans greedily with respect to the average of a number of decision tree models of the domain. By default, TEXPLORE uses nmodels = 5, C 4.5 discrete decision trees, and plans using the RTMBA real-time architecture (Hester et al 2012) with an action rate of 10 Hz.

For continuous domains, TEXPLORE can use M5 regression trees instead:

--model m5tree

To run TEXPLORE with Variance and Novelty Rewards (TEXPLORE-VANIR) (Hester and Stone 2012), set the coefficients for the variance and novelty explorations:

--n 5
--v 5

For domains with possible state and actuator delays, enable TEXPLORE to learn models from the previous k actions:

--history 5

The following options are available for the TEXPLORE agent:

Running the general Model-Based agent

There is also an option to run a general model-based agent, using any combination of models, planners, and exploration that you wish. To run it, type the following:

rosrun rl_agent agent --agent modelbased

The following options are available for the model-based agent:


2024-07-13 13:20