Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start from beginning of day #137

Open
wants to merge 100 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
08718c5
first commit
XanderKehoe Oct 31, 2024
520473c
successfully training env
XanderKehoe Nov 2, 2024
df12494
moving towards c implementation of env
XanderKehoe Nov 5, 2024
a69c7ea
updated main script to handle more run_modes and other refactors
XanderKehoe Nov 6, 2024
5a80774
fixed minor bugs with saving models
XanderKehoe Nov 6, 2024
c5c7f32
Merge branch 'dev' of https://github.com/pufferai/pufferlib into tras…
XanderKehoe Nov 7, 2024
4037e7a
Added documentation, but video mode still doesn't work...
XanderKehoe Nov 7, 2024
1d722e9
added video rendering support and other minor fixes
XanderKehoe Nov 7, 2024
a4c4150
updated env to have optional continuous reward signal for distance to…
XanderKehoe Nov 10, 2024
3606465
updated from CNN to just linear layer with one-hot encoding
XanderKehoe Nov 10, 2024
29cfa06
pre grid replacement obs update
XanderKehoe Nov 14, 2024
1ba058c
clean up
XanderKehoe Nov 16, 2024
ff682b9
removed .pt files
XanderKehoe Nov 16, 2024
7bc4f80
updated README.md with new gif
XanderKehoe Nov 16, 2024
595bb3c
clean up
XanderKehoe Nov 16, 2024
726c169
added validation checks when initializing environment and re-added gr…
XanderKehoe Nov 18, 2024
f6c686a
Merge branch 'dev' of https://github.com/pufferai/pufferlib into tras…
XanderKehoe Nov 18, 2024
96f3459
Addressed feedback from Joseph
XanderKehoe Nov 19, 2024
e61c313
fixed some bugs
XanderKehoe Nov 19, 2024
71ade21
fixed performance bugs with how num_agents was set
XanderKehoe Nov 19, 2024
82d4228
added new .ini file changes
XanderKehoe Nov 19, 2024
c01158d
Merge branch 'dev' of https://github.com/pufferai/pufferlib into tras…
XanderKehoe Nov 19, 2024
1dbc318
undo rware changes
XanderKehoe Nov 19, 2024
e983821
removed .gif from readme
XanderKehoe Nov 19, 2024
bfe8d1d
cleaned up .c & .h files
XanderKehoe Nov 19, 2024
8469004
Fixed bugs in env
XanderKehoe Nov 20, 2024
3a4571b
removed old comment
XanderKehoe Nov 20, 2024
67883e0
added user stats tracking and observation space buffer
XanderKehoe Nov 22, 2024
16d2821
adding custom network for env
XanderKehoe Nov 22, 2024
96dd3d2
added new custom network
XanderKehoe Nov 22, 2024
55fc290
simplified env for testing purposes
XanderKehoe Nov 22, 2024
03e5784
minor changes
XanderKehoe Nov 23, 2024
4864568
made all agents get a reward when trash is picked up or deposited ins…
XanderKehoe Nov 24, 2024
caf8f43
addressed feedback but now segfaulting...
XanderKehoe Nov 26, 2024
a414bc5
fixed some more issues, but weird bugs with the agents spirit inhabit…
XanderKehoe Nov 26, 2024
63733b3
fixed ghost bug
XanderKehoe Nov 26, 2024
437f70e
fixed more issues
XanderKehoe Nov 26, 2024
ffbd13c
pulled in latest dev branch and resolved merge conflict
XanderKehoe Nov 26, 2024
437d9c6
Updated model code, but getting 'Broken pipe' error now
XanderKehoe Nov 26, 2024
2e749a1
Merge branch 'dev' of https://github.com/pufferai/pufferlib into tras…
XanderKehoe Nov 27, 2024
3981186
clean up
XanderKehoe Nov 30, 2024
2c4ef0f
removed unused macro definition
XanderKehoe Nov 30, 2024
8619993
fixed allocation of obs space
XanderKehoe Nov 30, 2024
2b0d2fc
start from beginning of day
Dec 6, 2024
d13c4cf
Pedantic compiler
Dec 7, 2024
c9a4e0e
Version
Dec 8, 2024
958dbac
Start day randomizer
Dec 8, 2024
a544f5a
remove old pokemon stuff
Dec 8, 2024
f3ab711
Update readme
Dec 8, 2024
f2485ad
Trailer
Dec 8, 2024
ccef676
release
Dec 9, 2024
0147384
Fix sneaky github
Dec 9, 2024
af5d828
bump version
Dec 9, 2024
aa29f27
Update pong.py
thedch Dec 12, 2024
1885902
Merge pull request #138 from thedch/patch-1
jsuarez5341 Dec 21, 2024
d6935bb
Initial dev on vec infra refactor
Dec 21, 2024
d06946e
Makes PufferLib compatible with Python3.12. Probably doesn't break an…
Dec 21, 2024
5a0a22e
Merge branch '2.0' of https://github.com/pufferai/pufferlib into 2.0
Dec 21, 2024
952e935
Fix bug
Dec 21, 2024
6039a11
addressed some feedback, pulling in 2.0-dev
XanderKehoe Dec 21, 2024
f2778b4
Merge branch '2.0-dev' of https://github.com/pufferai/pufferlib into …
XanderKehoe Dec 21, 2024
a5a054b
Revert numpy
Dec 21, 2024
91f04e4
Serial single-agent working, multiagent in progress
Dec 21, 2024
a61a0a2
Fix multiagent serial with new buffers
Dec 21, 2024
719bb9b
Initial fixes for multiprocessing
Dec 21, 2024
b69a670
addressed some feedback, but getting segfault when trying to train...
XanderKehoe Dec 22, 2024
e78202b
Serial support for Native PufferEnvs. Up to 1M sps train in pure pyth…
Dec 22, 2024
48ac6c2
Merge pull request #140 from PufferAI/2.0-dev
jsuarez5341 Dec 22, 2024
28e6dd4
Reimplemented 2D Local Crop Obs Space
XanderKehoe Dec 23, 2024
b035dcc
Merge pull request #126 from XanderKehoe/trash_pickup
jsuarez5341 Dec 23, 2024
7b2ba16
Trash pickup fixes. Policy learns now
Dec 23, 2024
eae70c7
reskin
Dec 23, 2024
efac335
Trash pickup
Dec 23, 2024
0193b26
Merge pull request #141 from PufferAI/2.0-dev
jsuarez5341 Dec 23, 2024
e5168f9
add a barebones github action and status badge
thatguy11325 Dec 24, 2024
f9a5688
Integrate Daniel's Mac build fixes
Dec 24, 2024
debb0ef
Merge pull request #144 from PufferAI/2.0-dev
jsuarez5341 Dec 24, 2024
7fa25a8
add macos
thatguy11325 Dec 24, 2024
4918ccb
Merge pull request #143 from thatguy11325/github-action
jsuarez5341 Dec 24, 2024
4e793cf
Update install.yml
jsuarez5341 Dec 24, 2024
8c4c0f0
Small enduro refactor
Dec 27, 2024
e185fed
Merge pull request #145 from PufferAI/2.0-dev
jsuarez5341 Dec 27, 2024
eb77c95
Fix trash pickup obs: 1m sps train
Dec 27, 2024
c543e3d
Merge pull request #146 from PufferAI/2.0-dev
jsuarez5341 Dec 27, 2024
4e3db19
Merge remote-tracking branch 'pufferai/2.0' into j_enduro_pr
Jan 1, 2025
6b5135c
removed fn prototypes, removed verbose infos logging in cython, some …
Jan 1, 2025
986b3fa
cleaned up some things
Jan 2, 2025
4933556
some cleanup
Jan 3, 2025
734acd8
1744 lines enduro.h. removed client struct; replaced with gamestate. …
Jan 3, 2025
82257fc
significant line reduction and refactor
Jan 4, 2025
bae3af2
refactor; logging is broken a little maybe. still trains and beats 9 …
Jan 5, 2025
1bf5702
final changes, removed some gamestate struc vars that weren't used. s…
Jan 6, 2025
821018f
logging is mostly fixed. still intermittent initially, but much bette…
Jan 7, 2025
fc3dee3
cleaned up a few things
Jan 11, 2025
791bfe5
refactor to remove extraneous code and consolidate init() and reset()…
Jan 12, 2025
fcba83c
final edit, finishing touches. Python perf test: SPS: 2,045,564.37560…
Jan 12, 2025
8bee0f7
4 space
Jan 17, 2025
d2916b5
formatting fixed
Jan 17, 2025
e310000
further refactors: put logs with logs, removed unused variables from …
Jan 17, 2025
9ffddf4
minor fixes. TODO: init() reset() swap
Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: install
on:
push:
pull_request:

jobs:
test:
name: test ${{ matrix.py }} - ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
- macos-latest
py:
- "3.11"
- "3.10"
- "3.9"
steps:
- name: Setup python for test ${{ matrix.py }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.py }}
- uses: actions/checkout@v3
- name: Upgrade pip
run: python -m pip install -U pip
- name: Install pufferlib
run: pip3 install -e .
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# Annoying temp files generated by Cython
c_gae.c
pufferlib/extensions.c
pufferlib/ocean/grid/c_grid.c
pufferlib/ocean/tactical/c_tactical.c
pufferlib/puffernet.c

# Raylib
raylib_wasm/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
global-include *.pyx
global-include *.pxd
global-include *.h
global-include *.py
recursive-include pufferlib/resources *

33 changes: 5 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,16 @@
![figure](https://pufferai.github.io/source/resource/header.png)

[![PyPI version](https://badge.fury.io/py/pufferlib.svg)](https://badge.fury.io/py/pufferlib)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pufferlib)
![Github Actions](https://github.com/PufferAI/PufferLib/actions/workflows/install.yml/badge.svg)
[![](https://dcbadge.vercel.app/api/server/spT4huaGYV?style=plastic)](https://discord.gg/spT4huaGYV)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40jsuarez5341)](https://twitter.com/jsuarez5341)

You have an environment, a PyTorch model, and a reinforcement learning framework that are designed to work together but don’t. PufferLib is a wrapper layer that makes RL on complex game environments as simple as RL on Atari. You write a native PyTorch network and a short binding for your environment; PufferLib takes care of the rest.
PufferLib is the reinforcement learning library I wish existed during my PhD. It started as a compatibility layer to make working with complex environments a breeze. Now, it's a high-performance toolkit for research and industry with optimized parallel simulation, environments that run and train at 1M+ steps/second, and tons of quality of life improvements for practitioners. All our tools are free and open source. We also offer priority service for companies, startups, and labs!

All of our [Documentation](https://pufferai.github.io "PufferLib Documentation") is hosted by github.io. @jsuarez5341 on [Discord](https://discord.gg/spT4huaGYV) for support -- post here before opening issues. I am also looking for contributors interested in adding bindings for other environments and RL frameworks.
![Trailer](https://github.com/PufferAI/puffer.ai/blob/main/docs/assets/puffer_2.gif?raw=true)

## Demo

The current `demo.py` is a souped-up version of CleanRL PPO with optimized LSTM support, detailed performance metrics, a local dashboard, async envpool sampling, checkpointing, wandb sweeps, and more. It has a powerful `--help` that generates options based on the specified environment and policy. Hyperparams are in `config.yaml`. A few examples:

```
# Train minigrid with multiprocessing. Save it as a baseline.
python demo.py --env minigrid --mode train --vec multiprocessing
```

![figure](https://raw.githubusercontent.com/PufferAI/pufferai.github.io/1.0/docs/source/resource/puffer-dash.png)

```
# Load the current minigrid baseline and render it locally
python demo.py --env minigrid --mode eval --baseline

# Train squared with serial vectorization and save it as a wandb baseline
# The, load the current squared baseline and render it locally
python demo.py --env squared --mode train --baseline
python demo.py --env squared --mode eval --baseline

# Render NMMO locally with a random policy
python demo.py --env nmmo --mode eval

# Autotune vectorization settings for your machine
python demo.py --env breakout --mode autotune
```
All of our documentation is hosted at [puffer.ai](https://puffer.ai "PufferLib Documentation"). @jsuarez5341 on [Discord](https://discord.gg/puffer) for support -- post here before opening issues. We're always looking for new contributors, too!

## Star to puff up the project!

Expand Down
17 changes: 1 addition & 16 deletions clean_pufferl.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,21 +126,6 @@ def evaluate(data):
data.vecenv.send(actions)

with profile.eval_misc:
# Moves into models... maybe. Definitely moves.
# You could also just return infos and have it in demo
if 'pokemon_exploration_map' in infos:
for pmap in infos['pokemon_exploration_map']:
if not hasattr(data, 'pokemon_map'):
import pokemon_red_eval
data.map_updater = pokemon_red_eval.map_updater()
data.pokemon_map = pmap

data.pokemon_map = np.maximum(data.pokemon_map, pmap)

if len(infos['pokemon_exploration_map']) > 0:
rendered = data.map_updater(data.pokemon_map)
data.stats['Media/exploration_map'] = data.wandb.Image(rendered)

for k, v in infos.items():
if '_map' in k and data.wandb is not None:
data.stats[f'Media/{k}'] = data.wandb.Image(v[0])
Expand Down Expand Up @@ -703,7 +688,7 @@ def print_dashboard(env_name, utilization, global_step, epoch,
table.add_column(justify="center", width=13)
table.add_column(justify="right", width=13)
table.add_row(
f':blowfish: {c1}PufferLib {b2}1.0.0',
f':blowfish: {c1}PufferLib {b2}2.0.0',
f'{c1}CPU: {c3}{cpu_percent:.1f}%',
f'{c1}GPU: {c3}{gpu_percent:.1f}%',
f'{c1}DRAM: {c3}{dram_percent:.1f}%',
Expand Down
21 changes: 21 additions & 0 deletions config/ocean/pysquared.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[base]
package = ocean
env_name = puffer_pysquared
policy_name = Policy
rnn_name = Recurrent

[env]
num_envs = 1

[train]
total_timesteps = 40_000_000
checkpoint_interval = 50
num_envs = 12288
num_workers = 12
env_batch_size = 4096
batch_size = 131072
update_epochs = 1
minibatch_size = 8192
learning_rate = 0.0017
anneal_lr = False
device = cuda
64 changes: 64 additions & 0 deletions config/ocean/trash_pickup.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
[base]
package = ocean
env_name = trash_pickup puffer_trash_pickup
policy_name = TrashPickup
rnn_name = Recurrent

[env]
num_envs = 1024 # Recommended: 4096 (recommended start value) / num_agents
grid_size = 10
num_agents = 4
num_trash = 20
num_bins = 1
max_steps = 150
report_interval = 32
agent_sight_range = 5 # only used with 2D local crop obs space

[train]
total_timesteps = 100_000_000
checkpoint_interval = 200
num_envs = 2
num_workers = 2
env_batch_size = 1
batch_size = 131072
update_epochs = 1
minibatch_size = 16384
bptt_horizon = 8
anneal_lr = False
device = cuda
learning_rate=0.001
gamma = 0.95
gae_lambda = 0.85
vf_ceof = 0.4
clip_coef = 0.1
vf_clip_coef = 0.1
ent_coef = 0.01

[sweep.metric]
goal = maximize
name = environment/episode_return

[sweep.parameters.train.parameters.learning_rate]
distribution = log_uniform_values
min = 0.000001
max = 0.01

[sweep.parameters.train.parameters.gamma]
distribution = uniform
min = 0
max = 1

[sweep.parameters.train.parameters.gae_lambda]
distribution = uniform
min = 0
max = 1

[sweep.parameters.train.parameters.update_epochs]
distribution = int_uniform
min = 1
max = 4

[sweep.parameters.train.parameters.ent_coef]
distribution = log_uniform_values
min = 1e-5
max = 1e-1
84 changes: 0 additions & 84 deletions pokemon_red_eval.py

This file was deleted.

Loading
Loading