GitHub - bonetblai/gpt-rewards: Automatically exported from code.google.com/p/gpt-rewards

bonetblai / gpt-rewards Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Automatically exported from code.google.com/p/gpt-rewards

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benchmarks		benchmarks
src		src
README		README
makefile		makefile

Repository files navigation

Distribution of RTDP-BEL for reward-based POMDPs based
on the transformation to SSP described in 

  Solving POMDPs: RTDP-Bel bs. Point-based Algorithms.
  Blai Bonet and Hector Geffner.
  Proc. of 21st Int. Joint Conf. on Artificial Intelligence (IJCAI).
  Pasadena, California. 2009. AAAI Press. Pages 1641-1646.

Available at http://ldc.usb.ve/~bonet/reports/IJCAI09-rtdpbel.pdf

The RTDP-BEL is implemented inside an extract of the GPT software.
GPT is a shell-base system for solving POMDPs. Here, we only
provide an example session on its usage. For more information,
contact Blai Bonet <[email protected]>.


REQUIREMENTS
------------

You will need a proper GNU readline library. The ones that come
with Mac OSX are not good. In my mac, I use "brew install readline"
to obtain a good library.


INSTALLATION
------------

Unpack and perform 'make all'. By default, gpt is installed
in the bin directory. (In fact, bin/gpt is a symbolic link
to src/rtdp-bel/gpt).


EXAMPLE
-------

Suppose that you are at the top-level directory and the
problem RockSample_7_8.POMDP is in the same directory.

  $ bin/gpt
  Welcome to GPT, Version 2.00-cassandra.
  gpt> set pims [8,2000,500]
  gpt> solve RockSample_7_8.POMDP -
  [some output]
  gpt> quit

The set command is used to set the value of different 
parameters. Among the most important are: 'qlevels' that
specify the discretization levels used for quantization
(a higher qlevels means a finer discretization, default 15),
and 'cutoff' that specify the max length of a trial (default 250).

The pims parameter tells gpt how it must solve the POMDP
and evaluate the calculated policy. In the example, the
pims tell gpt that it should perform 16000 trials of RTDP-BEL,
and that each 2000 trials, the current policy should be evaluated
with 500 trials. Hence, such specification permits the 
generation of a quality profile that shows how the policy
is improved as more RTDP-BEL trials are performed.

The solve command receives two parameters: the path to a POMDP
file (in Cassandra's format) and the name of a file to redirect
the output. If the latter is '-', then the output is redirected
to the standard output.

For any questions or comments, please email me <[email protected]>.

Blai Bonet