Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to sample from the posterior #112

Open
kayhan-batmanghelich opened this issue Jun 22, 2017 · 12 comments
Open

How to sample from the posterior #112

kayhan-batmanghelich opened this issue Jun 22, 2017 · 12 comments

Comments

@kayhan-batmanghelich
Copy link

Hi,

I was wondering how to sample Posterior partitioning of the rows/columns (cluster assignment)? Is this feature supported? If not, is there any to do that? Theoretically, it should be possible but I couldn't find anything in the documentation. It seems it was possible in MATLAB version, for example here:

state.o(newK,:) = sample_partition(state.O, state.crpPriorC);

Thanks,

@fsaad
Copy link
Collaborator

fsaad commented Jun 22, 2017

Hi there,

There are a few ways to access the posterior row and column partitions. The easiest way is to use the X_L and X_D dictionaries that summarize a particular CrossCat latent state (which is one posterior sample). These two dictionaries are returned by Engine.initialize and Engine.analyze, shown below for the EngineTemplate base class:

https://github.com/probcomp/crosscat/blob/master/src/EngineTemplate.py#L31
https://github.com/probcomp/crosscat/blob/master/src/EngineTemplate.py#L39

  • For the column partition, use X_L['column_partition']['assignments'].
  • For the row partitions, X_D[view_idx] will contain the row partition in the given view.

The MATLAB implementation is quite deprecated.

Please let us know if you have more questions.

@kayhan-batmanghelich
Copy link
Author

@fsaad

Thank you for your reply. I have a few of questions:

  • Length of the X_L and X_D equal the number of chains but the chains are not drawn from the same models because the hyper-parameters are different, correct? How do you specify the hyper-parameters?
  • As you said, each X_D, X_L contain one posterior sample. To keep sampling from the posterior, do we need to rerun the model from the prior? I don't see any from_the_posterior in the engine.initialize; it only has from_the _prior option. Sampling from the scratch would be time-consuming. I would like to use corsscat inside of another model. Is there anyway?

Thanks,

@fsaad
Copy link
Collaborator

fsaad commented Jun 22, 2017

Length of the X_L and X_D equal the number of chains but the chains are not drawn from the same models because the hyper-parameters are different, correct? How do you specify the hyper-parameters?

Each chain is initialized by forward sampling all latent variables that comprise the CrossCat prior. There are many latent variables specified by CrossCat, and these are:

  • column CRP concentration
  • column partition
  • distributional (hyper)parameters for each column
  • row CRP concentrations (one for each block of the column partition)
  • row partitions (one for each block of the column partition)

To keep sampling from the posterior, do we need to rerun the model from the prior?

Each chain begins as an independent realization of all the latent variables from the prior. That is what happens when you call X_L, X_D, = engine.initialize(M_c, M_r, T, seed).

You can sample from the posterior by using engine.analyze which takes in X_L and X_D, runs posterior inference, and returns X_L_new and X_D_new. You can then pick-up inference from where it left off by calling engine.analyze again, providing it with X_L_new and X_D_new (which are not prior samples, so analysis does not starting from scratch). You can find an example this workflow here:

https://github.com/probcomp/crosscat/blob/master/src/tests/crosscat_client_runner.py#L41-L44

@kayhan-batmanghelich
Copy link
Author

kayhan-batmanghelich commented Jun 22, 2017

@fsaad

Thanks for the quick reply.

  • Regarding the hyper_param, I was referring this:
print "size of the chain : ",  len(X_L_list2)
print "X_L_list2[0]['column_hypers'] :" , len(X_L_list2[0]['column_hypers'])
print "len(X_L_list2[0]['column_hypers']) :" , len(X_L_list2[0]['column_hypers'])
print "X_L_list2[0]['column_hypers'][-1] :" , X_L_list2[0]['column_hypers'][-1]
print "X_L_list2[1]['column_hypers'][-1] :" , X_L_list2[1]['column_hypers'][-1]
size of the chain :  25
X_L_list2[0]['column_hypers'] : 64
len(X_L_list2[0]['column_hypers']) : 64
X_L_list2[0]['column_hypers'][-1] : {'mu': 78.60333333333332, 's': 401.091946721303, 'r': 1.0, 'fixed': 0.0, 'nu': 17.521415467935228}
X_L_list2[1]['column_hypers'][-1] : {'mu': 78.60333333333332, 's': 117.4655749680299, 'r': 0.6826384947222036, 'fixed': 0.0, 'nu': 8.164897511055578}

Do you see s and r and nu are different for chain 0 and chain 1 ? Are those samples from distribution on the top of the column_hypers ? If so, where do you specify hyper-params if the columns_hypers ? I don't see that in the API.

  • How to choose one of those chains to draw more posterior samples from? In variational Bayes, we use Evidence Lower Bound (ELBO). We usually choose the one with the highest ELBO. For MCMC, we usually eye-ball the trace plot of the chain to see which one is converged. How do you suggest to selected one of the chains?

  • If I understand it correctly, the example shows that to sample newX_L we don't need to make a n_steps high, right? unless there is some internal mixing process.

Thanks you,

@fsaad
Copy link
Collaborator

fsaad commented Jun 24, 2017

Do you see s and r and nu are different for chain 0 and chain 1 ? Are those samples from distribution on the top of the column_hypers?

Yes, each chain will typically, but not necessarily, have different values for column hyperparameters. I do not quite understand the second question, in particular the phrase "from distribution on the top of the column_hypers.

How to choose one of those chains to draw more posterior samples from?

There are at least two mechanisms for obtaining say K samples from MCMC inference algorithms:

  • Simulate a single MCMC chain until "burn-in", and then thin the chain for K steps with an appropriate thinning rate to get a collection of "uncorrelated" samples.
  • Initialize K independent chains which are run in parallel, and transition them until "burn-in". What you are left with is a single approximate posterior sample from each chain, and there are K chains, which are fully independent.

Your question seems to suggest a hybrid of these two, that is, you wish to select from one of the K independent chains, and then thin it to get a collection of uncorrelated samples. Now, you can just take the hyperparameters from each chain, and treat those as your bundle of posterior samples.

If you really wish to select a single chain, in the current framework of independent MCMC inference I am not aware of any formally correct techniques to do so. One approach which is theoretically sound would be to implement an SMC scheme for CrossCat using say K particles, have the data annealed one at a time, and then re-sample (with replacement) the chains at each step after some intermediate amount of Gibbs. However, this version of CrossCat does not have this inference algorithm, and it could be challenging to implement.

As for selecting a single chain using non-formal techniques, there are heuristics galore. Some approaches might be to consider the log-score; to keep a held-out dataset and study at the predictive likelihood; to investigate the dependencies found by each chain and select the one which best matches your domain knowledge or expectation; etc.

If I understand it correctly, the example shows that to sample newX_L we don't need to make a n_steps high, right? unless there is some internal mixing process.

The example that I linked to is purely illustrative, to show how to use the software API. It makes no assumptions or guarantees about inference quality, or the amount of steps you need to run. Typically this will be dominated by your data analysis application at hand.

@kayhan-batmanghelich
Copy link
Author

@fsaad Thanks.

I am trying to sample 10K partitions from the posterior distribution. To give a rough idea of how it takes for my problem, it takes about 10hr for 200 transitions. Therefore, running 10K chains is infeasible. This is the strategy that I am thinking:

  • 160 parallel chains each running for 1000 transitions. Of course, there is no way to make sure the chain has converged to a stationary distribution (posterior) but I hope 1000 is enough.
  • Starting from the each of the 160 chains, run 50 parallel chains each with a small transition (say 5 to 10) to generate quasi-independent samples.

I am still struggling with generating one sample after 1K transitions because the job ran for almost 3 days and not finished yet. Two questions:

  • There is also max_iterations, I couldn't find an explanation of this option.
  • Is there any way to see the trace plot, perhaps 10K transition is too much.

Thanks,

@fsaad
Copy link
Collaborator

fsaad commented Jun 29, 2017

it takes about 10hr for 200 transitions

My sense is that you might be using a dataset size which is larger than the target of this software --- how many rows and columns are you trying to learn?

There is also max_iterations, I couldn't find an explanation of this option.

I believe the max_iterations is not used by the system. You can try to use the max_seconds instead, which will terminate after the given amount of seconds.

Is there any way to see the trace plot, perhaps 10K transition is too much.

If you run LocalEngine.analyze with do_diagnostics=True then the latent variables at each step will be saved in a dictionary. The return value of analyze will then be a 3-tuple of the form:
X_L_new, X_D_new, diagnostics_new.

@kayhan-batmanghelich
Copy link
Author

kayhan-batmanghelich commented Jun 29, 2017 via email

@kayhan-batmanghelich
Copy link
Author

Thanks @fsaad , diagnostics_new is exactly what I want. The logscore can be used to monitor the quality of the chain (like ELBO).

@kayhan-batmanghelich
Copy link
Author

Hi @fsaad

It seems that max_time is respected in LocalEngine but not in the MultiprocessorEngine. Am I missing something?

@fsaad
Copy link
Collaborator

fsaad commented Jul 10, 2017

@kayhan-batmanghelich I'm not sure immediately why MultiprocessingEngine is not respecting max_time, please open a separate ticket so we can track the issue, and let me know if it is OK to close this one.

@kayhan-batmanghelich
Copy link
Author

@fsaad

Sure, I will open a new ticket. Please close this one.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants