Skip to content

Commit

Permalink
deploy: fff7fde
Browse files Browse the repository at this point in the history
  • Loading branch information
puyuan1996 committed Jul 25, 2024
1 parent a198fbd commit 114edcc
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions _modules/lzero/policy/unizero.html
Original file line number Diff line number Diff line change
Expand Up @@ -267,10 +267,10 @@ <h1>Source code for lzero.policy.unizero</h1><div class="highlight"><pre>
<span class="c1"># collect data -&gt; update policy-&gt; collect data -&gt; ...</span>
<span class="c1"># For different env, we have different episode_length,</span>
<span class="c1"># we usually set update_per_collect = collector_env_num * episode_length / batch_size * reuse_factor.</span>
<span class="c1"># If we set update_per_collect=None, we will set update_per_collect = collected_transitions_num * cfg.policy.model_update_ratio automatically.</span>
<span class="c1"># If we set update_per_collect=None, we will set update_per_collect = collected_transitions_num * cfg.policy.replay_ratio automatically.</span>
<span class="n">update_per_collect</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="c1"># (float) The ratio of the collected data used for training. Only effective when ``update_per_collect`` is not None.</span>
<span class="n">model_update_ratio</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span>
<span class="n">replay_ratio</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span>
<span class="c1"># (int) Minibatch size for one gradient descent.</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
<span class="c1"># (str) Optimizer for training policy network. [&#39;SGD&#39;, &#39;Adam&#39;]</span>
Expand Down
2 changes: 1 addition & 1 deletion api_doc/policy/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9053,7 +9053,7 @@ <h2>UniZeroPolicy<a class="headerlink" href="#unizeropolicy" title="Permalink to

<dl class="py attribute">
<dt class="sig sig-object py" id="lzero.policy.unizero.UniZeroPolicy.config">
<span class="sig-name descname"><span class="pre">config</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{'action_type':</span> <span class="pre">'fixed_action_space',</span> <span class="pre">'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'augmentation':</span> <span class="pre">['shift',</span> <span class="pre">'intensity'],</span> <span class="pre">'batch_size':</span> <span class="pre">256,</span> <span class="pre">'battle_mode':</span> <span class="pre">'play_with_bot_mode',</span> <span class="pre">'collect_with_pure_policy':</span> <span class="pre">False,</span> <span class="pre">'collector_env_num':</span> <span class="pre">8,</span> <span class="pre">'cuda':</span> <span class="pre">True,</span> <span class="pre">'discount_factor':</span> <span class="pre">0.997,</span> <span class="pre">'env_type':</span> <span class="pre">'not_board_games',</span> <span class="pre">'eps':</span> <span class="pre">{'decay':</span> <span class="pre">100000,</span> <span class="pre">'end':</span> <span class="pre">0.05,</span> <span class="pre">'eps_greedy_exploration_in_collect':</span> <span class="pre">False,</span> <span class="pre">'start':</span> <span class="pre">1.0,</span> <span class="pre">'type':</span> <span class="pre">'linear'},</span> <span class="pre">'eval_freq':</span> <span class="pre">2000,</span> <span class="pre">'evaluator_env_num':</span> <span class="pre">3,</span> <span class="pre">'fixed_temperature_value':</span> <span class="pre">0.25,</span> <span class="pre">'game_segment_length':</span> <span class="pre">400,</span> <span class="pre">'grad_clip_value':</span> <span class="pre">5,</span> <span class="pre">'gray_scale':</span> <span class="pre">False,</span> <span class="pre">'gumbel_algo':</span> <span class="pre">False,</span> <span class="pre">'ignore_done':</span> <span class="pre">False,</span> <span class="pre">'learning_rate':</span> <span class="pre">0.0001,</span> <span class="pre">'lr_piecewise_constant_decay':</span> <span class="pre">False,</span> <span class="pre">'manual_temperature_decay':</span> <span class="pre">False,</span> <span class="pre">'mcts_ctree':</span> <span class="pre">True,</span> <span class="pre">'model':</span> <span class="pre">{'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'bias':</span> <span class="pre">True,</span> <span class="pre">'categorical_distribution':</span> <span class="pre">True,</span> <span class="pre">'continuous_action_space':</span> <span class="pre">False,</span> <span class="pre">'frame_stack_num':</span> <span class="pre">1,</span> <span class="pre">'image_channel':</span> <span class="pre">3,</span> <span class="pre">'learn':</span> <span class="pre">{'learner':</span> <span class="pre">{'hook':</span> <span class="pre">{'save_ckpt_after_iter':</span> <span class="pre">10000}}},</span> <span class="pre">'model_type':</span> <span class="pre">'conv',</span> <span class="pre">'norm_type':</span> <span class="pre">'BN',</span> <span class="pre">'num_channels':</span> <span class="pre">64,</span> <span class="pre">'num_res_blocks':</span> <span class="pre">1,</span> <span class="pre">'observation_shape':</span> <span class="pre">(3,</span> <span class="pre">64,</span> <span class="pre">64),</span> <span class="pre">'res_connection_in_dynamics':</span> <span class="pre">True,</span> <span class="pre">'self_supervised_learning_loss':</span> <span class="pre">True,</span> <span class="pre">'support_scale':</span> <span class="pre">50,</span> <span class="pre">'world_model_cfg':</span> <span class="pre">{'action_space_size':</span> <span class="pre">6,</span> <span class="pre">'analysis_dormant_ratio':</span> <span class="pre">False,</span> <span class="pre">'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'attention':</span> <span class="pre">'causal',</span> <span class="pre">'attn_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'context_length':</span> <span class="pre">8,</span> <span class="pre">'device':</span> <span class="pre">'cpu',</span> <span class="pre">'dormant_threshold':</span> <span class="pre">0.025,</span> <span class="pre">'embed_dim':</span> <span class="pre">768,</span> <span class="pre">'embed_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'env_num':</span> <span class="pre">8,</span> <span class="pre">'gamma':</span> <span class="pre">1,</span> <span class="pre">'group_size':</span> <span class="pre">8,</span> <span class="pre">'gru_gating':</span> <span class="pre">False,</span> <span class="pre">'latent_recon_loss_weight':</span> <span class="pre">0.0,</span> <span class="pre">'max_blocks':</span> <span class="pre">10,</span> <span class="pre">'max_cache_size':</span> <span class="pre">5000,</span> <span class="pre">'max_tokens':</span> <span class="pre">20,</span> <span class="pre">'num_heads':</span> <span class="pre">8,</span> <span class="pre">'num_layers':</span> <span class="pre">4,</span> <span class="pre">'obs_type':</span> <span class="pre">'image',</span> <span class="pre">'perceptual_loss_weight':</span> <span class="pre">0.0,</span> <span class="pre">'policy_entropy_weight':</span> <span class="pre">0.0001,</span> <span class="pre">'predict_latent_loss_type':</span> <span class="pre">'group_kl',</span> <span class="pre">'resid_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'support_size':</span> <span class="pre">101,</span> <span class="pre">'tokens_per_block':</span> <span class="pre">2}},</span> <span class="pre">'model_update_ratio':</span> <span class="pre">0.25,</span> <span class="pre">'momentum':</span> <span class="pre">0.9,</span> <span class="pre">'monitor_extra_statistics':</span> <span class="pre">True,</span> <span class="pre">'multi_gpu':</span> <span class="pre">False,</span> <span class="pre">'n_episode':</span> <span class="pre">8,</span> <span class="pre">'num_simulations':</span> <span class="pre">50,</span> <span class="pre">'num_unroll_steps':</span> <span class="pre">10,</span> <span class="pre">'optim_type':</span> <span class="pre">'AdamW',</span> <span class="pre">'policy_entropy_loss_weight':</span> <span class="pre">0,</span> <span class="pre">'policy_loss_weight':</span> <span class="pre">1,</span> <span class="pre">'priority_prob_alpha':</span> <span class="pre">0.6,</span> <span class="pre">'priority_prob_beta':</span> <span class="pre">0.4,</span> <span class="pre">'random_collect_episode_num':</span> <span class="pre">0,</span> <span class="pre">'reward_loss_weight':</span> <span class="pre">1,</span> <span class="pre">'root_dirichlet_alpha':</span> <span class="pre">0.3,</span> <span class="pre">'root_noise_weight':</span> <span class="pre">0.25,</span> <span class="pre">'sample_type':</span> <span class="pre">'transition',</span> <span class="pre">'sampled_algo':</span> <span class="pre">False,</span> <span class="pre">'ssl_loss_weight':</span> <span class="pre">0,</span> <span class="pre">'target_update_freq':</span> <span class="pre">100,</span> <span class="pre">'target_update_freq_for_intrinsic_reward':</span> <span class="pre">1000,</span> <span class="pre">'target_update_theta':</span> <span class="pre">0.05,</span> <span class="pre">'td_steps':</span> <span class="pre">5,</span> <span class="pre">'threshold_training_steps_for_final_lr':</span> <span class="pre">50000,</span> <span class="pre">'threshold_training_steps_for_final_temperature':</span> <span class="pre">100000,</span> <span class="pre">'train_start_after_envsteps':</span> <span class="pre">0,</span> <span class="pre">'transform2string':</span> <span class="pre">False,</span> <span class="pre">'type':</span> <span class="pre">'unizero',</span> <span class="pre">'update_per_collect':</span> <span class="pre">None,</span> <span class="pre">'use_augmentation':</span> <span class="pre">False,</span> <span class="pre">'use_priority':</span> <span class="pre">False,</span> <span class="pre">'use_rnd_model':</span> <span class="pre">False,</span> <span class="pre">'use_ture_chance_label_in_chance_encoder':</span> <span class="pre">False,</span> <span class="pre">'value_loss_weight':</span> <span class="pre">0.25,</span> <span class="pre">'weight_decay':</span> <span class="pre">0.0001}</span></em><a class="headerlink" href="#lzero.policy.unizero.UniZeroPolicy.config" title="Permalink to this definition"></a></dt>
<span class="sig-name descname"><span class="pre">config</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{'action_type':</span> <span class="pre">'fixed_action_space',</span> <span class="pre">'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'augmentation':</span> <span class="pre">['shift',</span> <span class="pre">'intensity'],</span> <span class="pre">'batch_size':</span> <span class="pre">256,</span> <span class="pre">'battle_mode':</span> <span class="pre">'play_with_bot_mode',</span> <span class="pre">'collect_with_pure_policy':</span> <span class="pre">False,</span> <span class="pre">'collector_env_num':</span> <span class="pre">8,</span> <span class="pre">'cuda':</span> <span class="pre">True,</span> <span class="pre">'discount_factor':</span> <span class="pre">0.997,</span> <span class="pre">'env_type':</span> <span class="pre">'not_board_games',</span> <span class="pre">'eps':</span> <span class="pre">{'decay':</span> <span class="pre">100000,</span> <span class="pre">'end':</span> <span class="pre">0.05,</span> <span class="pre">'eps_greedy_exploration_in_collect':</span> <span class="pre">False,</span> <span class="pre">'start':</span> <span class="pre">1.0,</span> <span class="pre">'type':</span> <span class="pre">'linear'},</span> <span class="pre">'eval_freq':</span> <span class="pre">2000,</span> <span class="pre">'evaluator_env_num':</span> <span class="pre">3,</span> <span class="pre">'fixed_temperature_value':</span> <span class="pre">0.25,</span> <span class="pre">'game_segment_length':</span> <span class="pre">400,</span> <span class="pre">'grad_clip_value':</span> <span class="pre">5,</span> <span class="pre">'gray_scale':</span> <span class="pre">False,</span> <span class="pre">'gumbel_algo':</span> <span class="pre">False,</span> <span class="pre">'ignore_done':</span> <span class="pre">False,</span> <span class="pre">'learning_rate':</span> <span class="pre">0.0001,</span> <span class="pre">'lr_piecewise_constant_decay':</span> <span class="pre">False,</span> <span class="pre">'manual_temperature_decay':</span> <span class="pre">False,</span> <span class="pre">'mcts_ctree':</span> <span class="pre">True,</span> <span class="pre">'model':</span> <span class="pre">{'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'bias':</span> <span class="pre">True,</span> <span class="pre">'categorical_distribution':</span> <span class="pre">True,</span> <span class="pre">'continuous_action_space':</span> <span class="pre">False,</span> <span class="pre">'frame_stack_num':</span> <span class="pre">1,</span> <span class="pre">'image_channel':</span> <span class="pre">3,</span> <span class="pre">'learn':</span> <span class="pre">{'learner':</span> <span class="pre">{'hook':</span> <span class="pre">{'save_ckpt_after_iter':</span> <span class="pre">10000}}},</span> <span class="pre">'model_type':</span> <span class="pre">'conv',</span> <span class="pre">'norm_type':</span> <span class="pre">'BN',</span> <span class="pre">'num_channels':</span> <span class="pre">64,</span> <span class="pre">'num_res_blocks':</span> <span class="pre">1,</span> <span class="pre">'observation_shape':</span> <span class="pre">(3,</span> <span class="pre">64,</span> <span class="pre">64),</span> <span class="pre">'res_connection_in_dynamics':</span> <span class="pre">True,</span> <span class="pre">'self_supervised_learning_loss':</span> <span class="pre">True,</span> <span class="pre">'support_scale':</span> <span class="pre">50,</span> <span class="pre">'world_model_cfg':</span> <span class="pre">{'action_space_size':</span> <span class="pre">6,</span> <span class="pre">'analysis_dormant_ratio':</span> <span class="pre">False,</span> <span class="pre">'analysis_sim_norm':</span> <span class="pre">False,</span> <span class="pre">'attention':</span> <span class="pre">'causal',</span> <span class="pre">'attn_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'context_length':</span> <span class="pre">8,</span> <span class="pre">'device':</span> <span class="pre">'cpu',</span> <span class="pre">'dormant_threshold':</span> <span class="pre">0.025,</span> <span class="pre">'embed_dim':</span> <span class="pre">768,</span> <span class="pre">'embed_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'env_num':</span> <span class="pre">8,</span> <span class="pre">'gamma':</span> <span class="pre">1,</span> <span class="pre">'group_size':</span> <span class="pre">8,</span> <span class="pre">'gru_gating':</span> <span class="pre">False,</span> <span class="pre">'latent_recon_loss_weight':</span> <span class="pre">0.0,</span> <span class="pre">'max_blocks':</span> <span class="pre">10,</span> <span class="pre">'max_cache_size':</span> <span class="pre">5000,</span> <span class="pre">'max_tokens':</span> <span class="pre">20,</span> <span class="pre">'num_heads':</span> <span class="pre">8,</span> <span class="pre">'num_layers':</span> <span class="pre">4,</span> <span class="pre">'obs_type':</span> <span class="pre">'image',</span> <span class="pre">'perceptual_loss_weight':</span> <span class="pre">0.0,</span> <span class="pre">'policy_entropy_weight':</span> <span class="pre">0.0001,</span> <span class="pre">'predict_latent_loss_type':</span> <span class="pre">'group_kl',</span> <span class="pre">'resid_pdrop':</span> <span class="pre">0.1,</span> <span class="pre">'support_size':</span> <span class="pre">101,</span> <span class="pre">'tokens_per_block':</span> <span class="pre">2}},</span> <span class="pre">'momentum':</span> <span class="pre">0.9,</span> <span class="pre">'monitor_extra_statistics':</span> <span class="pre">True,</span> <span class="pre">'multi_gpu':</span> <span class="pre">False,</span> <span class="pre">'n_episode':</span> <span class="pre">8,</span> <span class="pre">'num_simulations':</span> <span class="pre">50,</span> <span class="pre">'num_unroll_steps':</span> <span class="pre">10,</span> <span class="pre">'optim_type':</span> <span class="pre">'AdamW',</span> <span class="pre">'policy_entropy_loss_weight':</span> <span class="pre">0,</span> <span class="pre">'policy_loss_weight':</span> <span class="pre">1,</span> <span class="pre">'priority_prob_alpha':</span> <span class="pre">0.6,</span> <span class="pre">'priority_prob_beta':</span> <span class="pre">0.4,</span> <span class="pre">'random_collect_episode_num':</span> <span class="pre">0,</span> <span class="pre">'replay_ratio':</span> <span class="pre">0.25,</span> <span class="pre">'reward_loss_weight':</span> <span class="pre">1,</span> <span class="pre">'root_dirichlet_alpha':</span> <span class="pre">0.3,</span> <span class="pre">'root_noise_weight':</span> <span class="pre">0.25,</span> <span class="pre">'sample_type':</span> <span class="pre">'transition',</span> <span class="pre">'sampled_algo':</span> <span class="pre">False,</span> <span class="pre">'ssl_loss_weight':</span> <span class="pre">0,</span> <span class="pre">'target_update_freq':</span> <span class="pre">100,</span> <span class="pre">'target_update_freq_for_intrinsic_reward':</span> <span class="pre">1000,</span> <span class="pre">'target_update_theta':</span> <span class="pre">0.05,</span> <span class="pre">'td_steps':</span> <span class="pre">5,</span> <span class="pre">'threshold_training_steps_for_final_lr':</span> <span class="pre">50000,</span> <span class="pre">'threshold_training_steps_for_final_temperature':</span> <span class="pre">100000,</span> <span class="pre">'train_start_after_envsteps':</span> <span class="pre">0,</span> <span class="pre">'transform2string':</span> <span class="pre">False,</span> <span class="pre">'type':</span> <span class="pre">'unizero',</span> <span class="pre">'update_per_collect':</span> <span class="pre">None,</span> <span class="pre">'use_augmentation':</span> <span class="pre">False,</span> <span class="pre">'use_priority':</span> <span class="pre">False,</span> <span class="pre">'use_rnd_model':</span> <span class="pre">False,</span> <span class="pre">'use_ture_chance_label_in_chance_encoder':</span> <span class="pre">False,</span> <span class="pre">'value_loss_weight':</span> <span class="pre">0.25,</span> <span class="pre">'weight_decay':</span> <span class="pre">0.0001}</span></em><a class="headerlink" href="#lzero.policy.unizero.UniZeroPolicy.config" title="Permalink to this definition"></a></dt>
<dd></dd></dl>

<dl class="py method">
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 114edcc

Please sign in to comment.