Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] Docs do-over (new API stack): Rewrite/enhance "getting started" rst page. #49950

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jan 18, 2025

Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.

  • Rename file from rllib-training.html to getting-started.html.
  • Translate everything to the new API stack and simplify a little.
  • Vale cleanup.
  • Move example code into ..testcode blocks.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
…_redo_getting_started

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	doc/source/rllib/rllib-training.rst
…_redo_getting_started

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	doc/source/rllib/rllib-training.rst
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 added rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack labels Jan 18, 2025
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some nits here and there. Great introduction for users into RLlib.

In this tutorial, you learn how to design, customize, and run an end-to-end RLlib learning experiment
from scratch. This includes picking and configuring an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`,
running a couple of training iterations, saving the state of your
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` from time to time, running a separate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is what most people are looking for.

Python API
~~~~~~~~~~

RLlib's Python API provides all the flexibility required for applying the library to any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other API than the Python one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope :D We got rid of the CLI, b/c of the maintenance burden, its stark limitations, and it being more or less a duplicate of a subset of what the python API could do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are working on the external access protocol for clients to connect to and communicate with RLlib, but that's heavily wip.

)


To scale your setup and define, how many EnvRunner actors you want to leverage,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put all class names into ``?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we might want to add that these EnvRunners are used to rollout the policy and collect samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. testcode::

# Build the Algorithm (PPO).
ppo = config.build_algo()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does build still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but you get a warning.

from pprint import pprint

for _ in range(5):
pprint(ppo.train())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

# Define your custom env class by subclassing gymnasium.Env:

class ParrotEnv(gym.Env):
"""Environment in which the agent learns to repeat the seen observations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha! Awesome!

# Point your config to your custom env class:
config = (
PPOConfig()
.environment(ParrotEnv) # add `env_config=[some Box space] to customize the env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a missing " ` "?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and clarified more. Also fixed the env accepting this suggested setting.

class CustomTorchRLModule(TorchRLModule):
def setup(self):
# You have access here to the following already set attributes:
# self.observation_space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description!!


At the end of your script, RLlib evaluates the trained Algorithm:
algo.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. Yes that is needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might however show it explicitly as otherwise users might run into problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in their own code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Will add a one-liner for this API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


The `state` of an instantiated Algorithm can be retrieved by calling its
`get_state` method. It contains all information necessary
to create the Algorithm from scratch. No access to the original code (e.g.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work now also with algorithms that had defined new attributes/methods? If the class is available it should imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. Users can decide to override the get_state/set_state APIs to add more stateful stuff to their state-dicts, but the basic functionality (restoring EnvRunners, RLModule, Learner optimizer states, connector pipelines, etc..) works across all algos.

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…_redo_metrics_logger

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	doc/source/rllib/package_ref/algorithm.rst
Signed-off-by: sven1977 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants