update docs

volcengine · Jan 9, 2025 · b5be3bb · b5be3bb
1 parent 3da21d2
commit b5be3bb
Show file tree

Hide file tree

Showing 4 changed files with 13 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -176,8 +176,8 @@ Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to l
 - Advance Usage and Extension
   - [Ray API Design Tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
   - [Extend to other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
-  - [Add models to FSDP backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
-  - [Add models to Megatron-LM backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
+  - [Add models with the FSDP backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
+  - [Add models with the Megatron-LM backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
 
 
 ## Citation
@@ -201,3 +201,4 @@ Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to l
 ## Publications Using veRL
 - [Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization](https://arxiv.org/abs/2410.09302)
 - [Flaming-hot Initiation with Regular Execution Sampling for Large Language Models](https://arxiv.org/abs/2410.21236)
+- [Process Reinforcement Through Implicit Rewards](https://github.com/PRIME-RL/PRIME/)
diff --git a/docs/advance/fsdp_extension.rst b/docs/advance/fsdp_extension.rst
@@ -1,6 +1,6 @@
 
-Add models to FSDP backend
-===========================
+Add models with the FSDP backend
+==================================
 
 Model
 --------------------------

diff --git a/docs/advance/megatron_extension.rst b/docs/advance/megatron_extension.rst
@@ -1,13 +1,13 @@
-Add models to Megatron-LM backend
-===================================
+Add models with the Megatron-LM backend
+=========================================
 
 Model
 -----------
 
-The most challenging aspect to use Megatron-LM backend is implementing
+The most challenging aspect to use the Megatron-LM backend is implementing
 the models for training. Currently, we implement Llama model that
 support data parallelism, tensor parallelism, pipeline parallelism (also
-vPP) and sequence parallelism. We also implement remove padding on Llama
+vPP) and sequence parallelism. We also implement remove padding (sequence packing) on Llama
 model, which can be found in `modeling_llama_megatron.py <https://github.com/volcengine/verl/blob/main/verl/models/llama/megatron/modeling_llama_megatron.py>`_.
 
 To support other model, users are required to implement:
@@ -22,4 +22,5 @@ To support other model, users are required to implement:
    (vLLM) model. Note that both the actor model and rollout model are
    partitioned during runtime. So, it's advisable to map the model name
    in actor model implementation. Otherwise, you may need an additional
-   name mapping and even weight transformation.
+   name mapping and even weight transformation. The weight loader implementation
+   is in `megatron_weight_loaders.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py>`_.
diff --git a/requirements.txt b/requirements.txt
@@ -9,4 +9,5 @@ pybind11
 ray
 tensordict<0.6
 transformers
-vllm<=0.6.3
+vllm<=0.6.3
+wandb