-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to sample from the posterior #112
Comments
Hi there, There are a few ways to access the posterior row and column partitions. The easiest way is to use the https://github.com/probcomp/crosscat/blob/master/src/EngineTemplate.py#L31
The MATLAB implementation is quite deprecated. Please let us know if you have more questions. |
Thank you for your reply. I have a few of questions:
Thanks, |
Each chain is initialized by forward sampling all latent variables that comprise the CrossCat prior. There are many latent variables specified by CrossCat, and these are:
Each chain begins as an independent realization of all the latent variables from the prior. That is what happens when you call You can sample from the posterior by using https://github.com/probcomp/crosscat/blob/master/src/tests/crosscat_client_runner.py#L41-L44 |
Thanks for the quick reply.
Do you see
Thanks you, |
Yes, each chain will typically, but not necessarily, have different values for column hyperparameters. I do not quite understand the second question, in particular the phrase "from distribution on the top of the
There are at least two mechanisms for obtaining say K samples from MCMC inference algorithms:
Your question seems to suggest a hybrid of these two, that is, you wish to select from one of the K independent chains, and then thin it to get a collection of uncorrelated samples. Now, you can just take the hyperparameters from each chain, and treat those as your bundle of posterior samples. If you really wish to select a single chain, in the current framework of independent MCMC inference I am not aware of any formally correct techniques to do so. One approach which is theoretically sound would be to implement an SMC scheme for CrossCat using say K particles, have the data annealed one at a time, and then re-sample (with replacement) the chains at each step after some intermediate amount of Gibbs. However, this version of CrossCat does not have this inference algorithm, and it could be challenging to implement. As for selecting a single chain using non-formal techniques, there are heuristics galore. Some approaches might be to consider the log-score; to keep a held-out dataset and study at the predictive likelihood; to investigate the dependencies found by each chain and select the one which best matches your domain knowledge or expectation; etc.
The example that I linked to is purely illustrative, to show how to use the software API. It makes no assumptions or guarantees about inference quality, or the amount of steps you need to run. Typically this will be dominated by your data analysis application at hand. |
@fsaad Thanks. I am trying to sample 10K partitions from the posterior distribution. To give a rough idea of how it takes for my problem, it takes about 10hr for 200 transitions. Therefore, running 10K chains is infeasible. This is the strategy that I am thinking:
I am still struggling with generating one sample after 1K transitions because the job ran for almost 3 days and not finished yet. Two questions:
Thanks, |
My sense is that you might be using a dataset size which is larger than the target of this software --- how many rows and columns are you trying to learn?
I believe the
If you run |
Hi @fsaad
Thank you reply.
Actually, the matrix is not that large, it is 19992x105 . Based on what I
read on the documentation, it is roughly in the order that crosscat can
handle but if you think it is not appropriate or it is too big, please let
me know.
Best,
…On Thu, Jun 29, 2017 at 11:17 AM, F Saad ***@***.***> wrote:
it takes about 10hr for 200 transitions
My sense is that you might be using a dataset size which is larger than
the target of this software --- how many rows and columns are you trying to
learn?
There is also max_iterations, I couldn't find an explanation of this
option.
I believe the max_iterations is not used by the system. You can try to
use the max_seconds instead, which will terminate after the given amount
of seconds.
Is there any way to see the trace plot, perhaps 10K transition is too much.
If you run LocalEngine.analyze with do_diagnostics=True then the latent
variables at each step will be saved in a dictionary. The return value of
analyze will then be a 3-tuple of the form:
X_L_new, X_D_new, diagnostics_new.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALu03aQwayNXmgJoboshT7-MKOc3yeXiks5sI7_zgaJpZM4OCgJ6>
.
|
Thanks @fsaad , |
Hi @fsaad It seems that |
@kayhan-batmanghelich I'm not sure immediately why MultiprocessingEngine is not respecting |
Sure, I will open a new ticket. Please close this one. Thanks |
Hi,
I was wondering how to sample Posterior partitioning of the rows/columns (cluster assignment)? Is this feature supported? If not, is there any to do that? Theoretically, it should be possible but I couldn't find anything in the documentation. It seems it was possible in MATLAB version, for example here:
crosscat/legacy/crosscat_matlab/runModel.m
Line 274 in 2de0192
Thanks,
The text was updated successfully, but these errors were encountered: