Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout Cause significant performance change between each trainning #24

Open
lichuanx opened this issue Apr 22, 2019 · 3 comments
Open

Comments

@lichuanx
Copy link

Using Dropout in child_model shows great works on prevent overfitting, however it also cause the final performance on model change significantly during each training with same hyper-params. It is too random that cause that we need using more sampling times to estimate final performance on one hyper-params which is very time consuming. Any ideal for solving this problem.

@barisozmen
Copy link
Owner

This is a very good point! It uses three samples for each different hyper-parameter set, in order to average the final performance. One idea to combat overfitting without causing variation in the final performance is using Batch Normalization instead of Dropout. I should try it and see if it's better. Do you have any idea on that?

@yrg23
Copy link

yrg23 commented May 4, 2019

i am using both batch norm and dropout on my custom dataset.

training images = 226
validation images = 40

model trains with 1356 images and validates on 40 images. however, it generates 0.1 val score on each epoch. is this normal?

This is my model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 32, 32, 16)        1216      
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 16)        64        
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 32)        12832     
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        18496     
_________________________________________________________________
batch_normalization_3 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
activation_3 (Activation)    (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
batch_normalization_4 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
activation_4 (Activation)    (None, 16, 16, 64)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               1048832   
_________________________________________________________________
activation_5 (Activation)    (None, 256)               0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                8224      
_________________________________________________________________
activation_6 (Activation)    (None, 32)                0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 165       
_________________________________________________________________
activation_7 (Activation)    (None, 5)                 0         
=================================================================
Total params: 1,127,397
Trainable params: 1,127,045
Non-trainable params: 352

@lichuanx
Copy link
Author

lichuanx commented May 5, 2019

This is a very good point! It uses three samples for each different hyper-parameter set, in order to average the final performance. One idea to combat overfitting without causing variation in the final performance is using Batch Normalization instead of Dropout. I should try it and see if it's better. Do you have any idea on that?

well, in my term, I will use sgd instead of adaptive-opt, cause sgd tend to converged on a flat minimum which shows better generalization. Sgd is much slow than adaptive-opt so that I will change learning rate schedule to cosine-cyclical-learning rate, thus will be more steady outcome. Because we only seek relative better hyper-params not "best" param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants