The unnatural encoding of current implementation #54

hsiaoyi0504 · 2017-01-04T13:58:18Z

After testing, I found that the procedure of building a list of the unique characters used in the dataset (The "charset") is wired. Current encoding will make the resulting output much fragile, because we didn't avoid the situation of Cl interpreted as "C", "l". For example, we should treat 'Cl' as independent character rather than 'C' and 'l' directly. It chemically unreasonable to see 'l' along.

grayfall · 2017-03-28T13:46:08Z

Have this problem ever been addressed? Apart from this the charsets are not stable between different training datasets, yielding incompatible models.

pechersky · 2017-03-28T14:02:43Z

I suggest checking out the paper and repo I cite in #62. It also has pretrained models if you need that.

grayfall · 2017-03-30T12:48:24Z

@pechersky do you accept pull requests? I've made some improvements to your preprocessing routine and the CLI. Most importantly, I changed the parsing scheme to address the issues mentioned here.

pechersky · 2017-03-30T13:02:36Z

Yeah, go ahead and make a PR.

…

On Thu, Mar 30, 2017 at 8:48 AM, Eli ***@***.***> wrote: @pechersky <https://github.com/pechersky> do you accept pull requests? I've made some improvements to your preprocessing routine and the CLI. Most importantly, I changed the parsing scheme to address the issues mentioned here. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGDhiUCRPs3JDSigy6wG3O-EXw4DTFSks5rq6SYgaJpZM4LapUj> .

This was referenced Feb 26, 2017

Problems with the model_500k.h5 #58

Open

The charset HIPS/molecule-autoencoder#1

Open

pechersky mentioned this issue Mar 28, 2017

Incorporate Grammar VAE #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The unnatural encoding of current implementation #54

The unnatural encoding of current implementation #54

hsiaoyi0504 commented Jan 4, 2017

grayfall commented Mar 28, 2017

pechersky commented Mar 28, 2017

grayfall commented Mar 30, 2017

pechersky commented Mar 30, 2017 via email

The unnatural encoding of current implementation #54

The unnatural encoding of current implementation #54

Comments

hsiaoyi0504 commented Jan 4, 2017

grayfall commented Mar 28, 2017

pechersky commented Mar 28, 2017

grayfall commented Mar 30, 2017

pechersky commented Mar 30, 2017 via email