Generative Adversarial Networks #55

maxhodak · 2017-01-14T16:40:05Z

In case you guys haven't seen it, this paper came out recently and looks kind of interesting: https://arxiv.org/abs/1701.01329

My first couple read throughs leave me with some questions. The paper triggers a couple of my first-order heuristics (explaining basic stuff like RNNs, seemingly magical performance on generating long valid SMILEs that suggests overfitting) and has kind of a weird application of fine-tuning as transfer learning, among other things. I'm planning on working up some parts of this paper like the stacked LSTMs as a SMILES generator for transfer to a property prediction network over this weekend. Anyone else have any comments on this paper or things to try?

@pechersky @dribnet @dakoner

pechersky · 2017-01-17T14:32:20Z

I'd be glad to work on this with you. I agree, it looks like they overfit their model to generate similar strings. This is especially evident in the fact that they supposedly got clean adamantyl strings.The t-SNE plot tells us nothing because we don't know with what perplexity they ran it. Additionally, the epoch graph hints at overfitting. I like their fine-tuning idea, of taking a generically trained network and optimizing it for a subset of the space. Some metric on the fine-tuned network of specific-vs-general generation would be useful. Since they canonicalize the SMILES, I don't understand why they'd use edit-distance, since small changes in the chemical topology can cause large changes in the edit-distance. This 3-LSTM/Dropout topology looks pretty simple. I wonder what results it would give if the symbol table wasn't made of characters but of SMILES tokens.

…

On Sat, Jan 14, 2017 at 11:40 AM, Max Hodak ***@***.***> wrote: In case you guys haven't seen it, this paper came out recently and looks kind of interesting: https://arxiv.org/abs/1701.01329 My first couple read throughs leave me with some questions. The paper triggers a couple of my first-order heuristics (explaining basic stuff like RNNs, seemingly magical performance on generating long valid SMILEs that suggests overfitting) and has kind of a weird application of fine-tuning as transfer learning, among other things. I'm planning on working up some parts of this paper like the stacked LSTMs as a SMILES generator for transfer to a property prediction network over this weekend. Anyone else have any comments on this paper or things to try? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#55>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhiEol_InhNJb5QTL9b28zX2HOHwxks5rSPpmgaJpZM4LjrGk> .

maxhodak · 2017-01-17T17:16:03Z

So I let a 3-LSTM run overnight on Saturday and the loss fell to near zero, but it was definitely overfitting; it clearly wasn't extracting any interesting information about the underlying chemistry. At this point I got distracted by the idea of using a GAN instead, which is what I've been doing since then. It's pretty difficult to get it to train well as the discriminator is clearly much easier to learn compared to the generator (the discriminator is pretty trivial when the generator is weak), so I haven't figured out yet how to keep the two in reasonable balance. I'm planning on asking a couple friends at OpenAI for some advice later today. I'll post my ipynb file once I have it working a little better!

pechersky · 2017-01-17T19:10:10Z

What are you using as the discriminator? SMILES validity or just absence of (possibly invalid) string from training set?

…

On Tue, Jan 17, 2017 at 12:16 PM, Max Hodak ***@***.***> wrote: So I let a 3-LSTM run overnight on Saturday and the loss fell to near zero, but it was definitely overfitting; it definitely wasn't extracting any interesting information about the underlying chemistry. At this point I got distracted by the idea of using a GAN instead, which is what I've been doing since then. It's pretty difficult to get it to train well as the discriminator is clearly much easier to learn (pretty trivial to learn) compared to the generator, so I haven't figured out yet how to keep the two in reasonable balance. I'm planning on asking a couple friends at OpenAI for some advice later today. I'll post my ipynb file once I have it working a little better! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhosSMpzzktE1kzT9CZu9c_Gf4cNpks5rTPdUgaJpZM4LjrGk> .

maxhodak · 2017-01-17T19:12:38Z

I'm using presence/absence from the set. SMILES validity is an arguably even easier metric, as CCCCCcccccccccccccccccccccccccccccccccc is valid SMILES but not representative of the distribution we want to learn.

maxhodak · 2017-01-17T19:17:42Z

This is pretty typical of attempts to train my network right now:

Sampling from which gives me stuff that looks like Caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

pechersky · 2017-01-17T19:18:41Z

One could do SMILES validity AND number of heavy atoms is <= tunable_parameter * max([number of heavy atoms in ligand | ligand in test set]). I think a GAN approach is awesome for this, perhaps we swap the GRUs from the VAE with deconv layers.

…

On Tue, Jan 17, 2017 at 2:12 PM, Max Hodak ***@***.***> wrote: I'm using presence/absence from the set. SMILES validity is an arguably even easier metric, as CCCCCcccccccccccccccccccccccccccccccccc is valid SMILES but not representative of the distribution we want to learn. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhnxhRAgrSvG-VvRYIS_DTeML4QiRks5rTRKngaJpZM4LjrGk> .

maxhodak · 2017-01-17T19:20:51Z

See: https://github.com/maxhodak/keras-molecules/blob/gan/SMILES_GAN.ipynb

maxhodak · 2017-01-17T19:25:41Z

On pretraining, worth noting that if I don't pretrain the generator, no interesting training happens at all when I try and train the GAN. Discriminator loss just goes to 0 and generator loss goes to ~16. It's not clear if pretraining the discriminator matters, or even makes things worse.

Some posts suggest changing learning parameters at runtime depending on which side is "advantaged", see https://github.com/torch/torch.github.io/blob/master/blog/_posts/2015-11-13-gan.md

Some more ideas here I haven't worked through yet: https://github.com/soumith/ganhacks

pechersky · 2017-01-17T20:20:12Z

You're generating noise using uniform sampling between 0 and 1 at every position in the vector. Our true data is some weird subset of the space of all possible one-hot vectors. What if you started by skipping training the generator, and just tried to train a discriminator with true data vs random strings sampled from the alphabet?

…

On Tue, Jan 17, 2017 at 2:25 PM, Max Hodak ***@***.***> wrote: On pretraining, worth noting that if I don't pretrain the generator, no interesting training happens at all when I try and train the GAN. Discriminator loss just goes to 0 and generator loss goes to ~16. It's not clear if pretraining the discriminator matters, or even makes things worse. Some posts suggest changing learning parameters at runtime depending on which side is "advantaged", see https://github.com/torch/ torch.github.io/blob/master/blog/_posts/2015-11-13-gan.md Some more ideas here I haven't worked through yet: https://github.com/soumith/ganhacks — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhtLn16L1PdlEGiimbNaZXDC97faZks5rTRW1gaJpZM4LjrGk> .

maxhodak · 2017-01-17T22:24:17Z

I'm not sure that matters... this isn't an autoencoder; the input is just a source of entropy. The nonlinearities in the generator network should mean the distribution of the input need not resemble the distribution of the output unless I've misunderstood something.

maxhodak · 2017-01-17T23:38:13Z

I've got something looking much better now after working in a bunch of the tricks linked above, though it still has a lot of room for improvement:

After 200 iterations the generator samples out stuff like:

CCcccC
CCcccN
CCcccc
CCcccC
CCcCc
CCCcccccc
CCcccC

Updated notebook at https://github.com/maxhodak/keras-molecules/blob/gan/SMILES_GAN.ipynb

pechersky · 2017-01-18T14:03:09Z

Nice! Do you think the simple strings are due to just insufficient training, or its converging to a simple part of the string space? Perhaps including the KL divergence in the loss might help. If I understand correctly, the generator is like the "decoder" part from the VAE. Do you think the topology and layer identity (LSTM / GRU / something else) make a qualitative difference in richness of generated molecules? I was thinking something like a Grid LSTM <https://arxiv.org/pdf/1507.01526v1.pdf> might be appropriate for our system due to local and distant correlation in the string.

…

On Tue, Jan 17, 2017 at 6:38 PM, Max Hodak ***@***.***> wrote: I've got something looking much better now after working in a bunch of the tricks linked above, though it still has a lot of room for improvement: [image: screen shot 2017-01-17 at 3 36 26 pm] <https://cloud.githubusercontent.com/assets/83726/22044697/c699b172-dcca-11e6-968e-163c80e3d41b.png> After 200 iterations the generator samples out stuff like: CCcccC CCcccN CCcccc CCcccC CCcCc CCCcccccc CCcccC Updated notebook at https://github.com/maxhodak/keras-molecules/blob/gan/ SMILES_GAN.ipynb — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhlFN6_ykz_55HxEvt-K42B8cTBA6ks5rTVDlgaJpZM4LjrGk> .

XericZephyr · 2017-02-28T01:22:01Z

Hey, guys. Found a good place. I am also working on this field. I was trying to use seq2seq model to produce an unsupervised fingerprint for each molecule. I am also trying to use GAN as a future work. Does anyone have any update on this GAN idea?

maxhodak changed the title ~~Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks by Segler et al~~ Generative Adversarial Networks Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generative Adversarial Networks #55

Generative Adversarial Networks #55

maxhodak commented Jan 14, 2017 •

edited

Loading

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017 •

edited

Loading

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017

maxhodak commented Jan 17, 2017

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017

maxhodak commented Jan 17, 2017

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017 •

edited

Loading

maxhodak commented Jan 17, 2017

pechersky commented Jan 18, 2017 via email

XericZephyr commented Feb 28, 2017

Generative Adversarial Networks #55

Generative Adversarial Networks #55

Comments

maxhodak commented Jan 14, 2017 • edited Loading

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017 • edited Loading

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017

maxhodak commented Jan 17, 2017

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017

maxhodak commented Jan 17, 2017

pechersky commented Jan 17, 2017 via email

maxhodak commented Jan 17, 2017 • edited Loading

maxhodak commented Jan 17, 2017

pechersky commented Jan 18, 2017 via email

XericZephyr commented Feb 28, 2017

maxhodak commented Jan 14, 2017 •

edited

Loading

maxhodak commented Jan 17, 2017 •

edited

Loading

maxhodak commented Jan 17, 2017 •

edited

Loading