Skip to content

Commit

Permalink
docs: sable caption math
Browse files Browse the repository at this point in the history
  • Loading branch information
RuanJohn committed Dec 13, 2024
1 parent 76d7eb0 commit 8b9b66f
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions mava/systems/sable/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ For an overview of how the algorithm works, please see the diagram below. For a
</p>

*Sable architecture and execution.* The encoder receives all agent observations
$ o_t^1, \dots, o_t^N $ from the current timestep $ t $ along with a hidden state
$ h_{t-1}^{\text{enc}} $ representing past timesteps and produces encoded observations
$ \hat{o}_t^1, \dots, \hat{o}_t^N $, observation-values $ v(\hat{o}_t^1), \dots, v(\hat{o}_t^N) $,
and a new hidden state $ h_t^{\text{enc}} $.
The decoder performs recurrent retention over the current action $ a_t^{m-1} $, followed by cross attention with the encoded observations, producing the next action $ a_t^m $. The initial hidden states for recurrence over agents in the decoder at the current timestep are
$ (h_{t-1}^{\text{dec}_1}, h_{t-1}^{\text{dec}_2}) $, and by the end of the decoding process, it generates the updated hidden states $ (h_t^{\text{dec}_1}, h_t^{\text{dec}_2}) $.
$o_t^1,\dots,o_t^N$ from the current timestep $t$ along with a hidden state
$h_{t-1}^{\text{enc}}$ representing past timesteps and produces encoded observations
$\hat{o}_t^1,\dots,\hat{o}_t^N$, observation-values $v(\hat{o}_t^1),\dots,v(\hat{o}_t^N)$,
and a new hidden state $h_t^{\text{enc}}$.
The decoder performs recurrent retention over the current action $a_t^{m-1}$, followed by cross attention with the encoded observations, producing the next action $a_t^m$. The initial hidden states for recurrence over agents in the decoder at the current timestep are
$(h_{t-1}^{\text{dec}_1},h_{t-1}^{\text{dec}_2})$, and by the end of the decoding process, it generates the updated hidden states $(h_t^{\text{dec}_1},h_t^{\text{dec}_2})$.

## Relevant paper:
* [Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2410.01706)
Expand Down

0 comments on commit 8b9b66f

Please sign in to comment.