-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generative Adversarial Networks #55
Comments
I'd be glad to work on this with you. I agree, it looks like they overfit
their model to generate similar strings. This is especially evident in the
fact that they supposedly got clean adamantyl strings.The t-SNE plot tells
us nothing because we don't know with what perplexity they ran it.
Additionally, the epoch graph hints at overfitting. I like their
fine-tuning idea, of taking a generically trained network and optimizing it
for a subset of the space. Some metric on the fine-tuned network of
specific-vs-general generation would be useful. Since they canonicalize the
SMILES, I don't understand why they'd use edit-distance, since small
changes in the chemical topology can cause large changes in the
edit-distance.
This 3-LSTM/Dropout topology looks pretty simple. I wonder what results it
would give if the symbol table wasn't made of characters but of SMILES
tokens.
…On Sat, Jan 14, 2017 at 11:40 AM, Max Hodak ***@***.***> wrote:
In case you guys haven't seen it, this paper came out recently and looks
kind of interesting: https://arxiv.org/abs/1701.01329
My first couple read throughs leave me with some questions. The paper
triggers a couple of my first-order heuristics (explaining basic stuff like
RNNs, seemingly magical performance on generating long valid SMILEs that
suggests overfitting) and has kind of a weird application of fine-tuning as
transfer learning, among other things. I'm planning on working up some
parts of this paper like the stacked LSTMs as a SMILES generator for
transfer to a property prediction network over this weekend. Anyone else
have any comments on this paper or things to try?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#55>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFGDhiEol_InhNJb5QTL9b28zX2HOHwxks5rSPpmgaJpZM4LjrGk>
.
|
So I let a 3-LSTM run overnight on Saturday and the loss fell to near zero, but it was definitely overfitting; it clearly wasn't extracting any interesting information about the underlying chemistry. At this point I got distracted by the idea of using a GAN instead, which is what I've been doing since then. It's pretty difficult to get it to train well as the discriminator is clearly much easier to learn compared to the generator (the discriminator is pretty trivial when the generator is weak), so I haven't figured out yet how to keep the two in reasonable balance. I'm planning on asking a couple friends at OpenAI for some advice later today. I'll post my ipynb file once I have it working a little better! |
What are you using as the discriminator? SMILES validity or just absence of
(possibly invalid) string from training set?
…On Tue, Jan 17, 2017 at 12:16 PM, Max Hodak ***@***.***> wrote:
So I let a 3-LSTM run overnight on Saturday and the loss fell to near
zero, but it was definitely overfitting; it definitely wasn't extracting
any interesting information about the underlying chemistry. At this point I
got distracted by the idea of using a GAN instead, which is what I've been
doing since then. It's pretty difficult to get it to train well as the
discriminator is clearly much easier to learn (pretty trivial to learn)
compared to the generator, so I haven't figured out yet how to keep the two
in reasonable balance. I'm planning on asking a couple friends at OpenAI
for some advice later today. I'll post my ipynb file once I have it working
a little better!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGDhosSMpzzktE1kzT9CZu9c_Gf4cNpks5rTPdUgaJpZM4LjrGk>
.
|
I'm using presence/absence from the set. SMILES validity is an arguably even easier metric, as |
One could do SMILES validity AND number of heavy atoms is <=
tunable_parameter * max([number of heavy atoms in ligand | ligand in test
set]). I think a GAN approach is awesome for this, perhaps we swap the GRUs
from the VAE with deconv layers.
…On Tue, Jan 17, 2017 at 2:12 PM, Max Hodak ***@***.***> wrote:
I'm using presence/absence from the set. SMILES validity is an arguably
even easier metric, as CCCCCcccccccccccccccccccccccccccccccccc is valid
SMILES but not representative of the distribution we want to learn.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGDhnxhRAgrSvG-VvRYIS_DTeML4QiRks5rTRKngaJpZM4LjrGk>
.
|
On pretraining, worth noting that if I don't pretrain the generator, no interesting training happens at all when I try and train the GAN. Discriminator loss just goes to 0 and generator loss goes to ~16. It's not clear if pretraining the discriminator matters, or even makes things worse. Some posts suggest changing learning parameters at runtime depending on which side is "advantaged", see https://github.com/torch/torch.github.io/blob/master/blog/_posts/2015-11-13-gan.md Some more ideas here I haven't worked through yet: https://github.com/soumith/ganhacks |
You're generating noise using uniform sampling between 0 and 1 at every
position in the vector. Our true data is some weird subset of the space of
all possible one-hot vectors. What if you started by skipping training the
generator, and just tried to train a discriminator with true data vs random
strings sampled from the alphabet?
…On Tue, Jan 17, 2017 at 2:25 PM, Max Hodak ***@***.***> wrote:
On pretraining, worth noting that if I don't pretrain the generator, no
interesting training happens at all when I try and train the GAN.
Discriminator loss just goes to 0 and generator loss goes to ~16. It's not
clear if pretraining the discriminator matters, or even makes things worse.
Some posts suggest changing learning parameters at runtime depending on
which side is "advantaged", see https://github.com/torch/
torch.github.io/blob/master/blog/_posts/2015-11-13-gan.md
Some more ideas here I haven't worked through yet:
https://github.com/soumith/ganhacks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGDhtLn16L1PdlEGiimbNaZXDC97faZks5rTRW1gaJpZM4LjrGk>
.
|
I'm not sure that matters... this isn't an autoencoder; the input is just a source of entropy. The nonlinearities in the generator network should mean the distribution of the input need not resemble the distribution of the output unless I've misunderstood something. |
I've got something looking much better now after working in a bunch of the tricks linked above, though it still has a lot of room for improvement: After 200 iterations the generator samples out stuff like:
Updated notebook at https://github.com/maxhodak/keras-molecules/blob/gan/SMILES_GAN.ipynb |
Nice! Do you think the simple strings are due to just insufficient
training, or its converging to a simple part of the string space? Perhaps
including the KL divergence in the loss might help. If I understand
correctly, the generator is like the "decoder" part from the VAE. Do you
think the topology and layer identity (LSTM / GRU / something else) make a
qualitative difference in richness of generated molecules? I was thinking
something like a Grid LSTM <https://arxiv.org/pdf/1507.01526v1.pdf> might
be appropriate for our system due to local and distant correlation in the
string.
…On Tue, Jan 17, 2017 at 6:38 PM, Max Hodak ***@***.***> wrote:
I've got something looking much better now after working in a bunch of the
tricks linked above, though it still has a lot of room for improvement:
[image: screen shot 2017-01-17 at 3 36 26 pm]
<https://cloud.githubusercontent.com/assets/83726/22044697/c699b172-dcca-11e6-968e-163c80e3d41b.png>
After 200 iterations the generator samples out stuff like:
CCcccC
CCcccN
CCcccc
CCcccC
CCcCc
CCCcccccc
CCcccC
Updated notebook at https://github.com/maxhodak/keras-molecules/blob/gan/
SMILES_GAN.ipynb
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGDhlFN6_ykz_55HxEvt-K42B8cTBA6ks5rTVDlgaJpZM4LjrGk>
.
|
Hey, guys. Found a good place. I am also working on this field. I was trying to use seq2seq model to produce an unsupervised fingerprint for each molecule. I am also trying to use GAN as a future work. Does anyone have any update on this GAN idea? |
In case you guys haven't seen it, this paper came out recently and looks kind of interesting: https://arxiv.org/abs/1701.01329
My first couple read throughs leave me with some questions. The paper triggers a couple of my first-order heuristics (explaining basic stuff like RNNs, seemingly magical performance on generating long valid SMILEs that suggests overfitting) and has kind of a weird application of fine-tuning as transfer learning, among other things. I'm planning on working up some parts of this paper like the stacked LSTMs as a SMILES generator for transfer to a property prediction network over this weekend. Anyone else have any comments on this paper or things to try?
@pechersky @dribnet @dakoner
The text was updated successfully, but these errors were encountered: