We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, Thank for your contribution. I encountered one problem. After one epoch, the loss would be nan like
Epoch 1: 28%|#################7 | 47/167 [10:03<25:40, 12.84s/it, acc=24.5, loss=260, step=47]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 69102 get requests, put_count=76956 evicted_count=7000 eviction_rate=0.0909611 and unsatisfied allocation rate=0 Epoch 1: 29%|##################4 | 49/167 [10:24<25:04, 12.75s/it, acc=25.5, loss=237, step=49]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 16280 get requests, put_count=18313 evicted_count=1000 eviction_rate=0.054606 and unsatisfied allocation rate=0 Epoch 1: 30%|##################8 | 50/167 [10:35<24:47, 12.71s/it, acc=24.3, loss=262, step=50]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 43740 get requests, put_count=48876 evicted_count=4000 eviction_rate=0.0818398 and unsatisfied allocation rate=0 Epoch 1: 99%|##############################################################6| 166/167 [32:06<00:11, 11.61s/it, acc=32, loss=nan, step=166]wait! Epoch 1: 100%|###############################################################| 167/167 [32:17<00:00, 11.60s/it, acc=32, loss=nan, step=167] Epoch 2: 13%|########4 | 22/167 [03:51<25:23, 10.51s/it, acc=32, loss=nan, step=189]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 28810067 get requests, put_count=28811000 evicted_count=2000 eviction_rate=6.94179e-05 and unsatisfied allocation rate=9.47585e-05 Epoch 2: 99%|##############################################################6| 166/167 [29:29<00:10, 10.66s/it, acc=32, loss=nan, step=333]wait! Epoch 2: 100%|###############################################################| 167/167 [29:40<00:00, 10.66s/it, acc=32, loss=nan, step=334
Did you encounter this problem?
The text was updated successfully, but these errors were encountered:
reduce learn rate fine adjust batch-size and learn rate
Sorry, something went wrong.
No branches or pull requests
Hello, Thank for your contribution. I encountered one problem. After one epoch, the loss would be nan like
Epoch 1: 28%|#################7 | 47/167 [10:03<25:40, 12.84s/it, acc=24.5, loss=260, step=47]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 69102 get requests, put_count=76956 evicted_count=7000 eviction_rate=0.0909611 and unsatisfied allocation rate=0
Epoch 1: 29%|##################4 | 49/167 [10:24<25:04, 12.75s/it, acc=25.5, loss=237, step=49]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 16280 get requests, put_count=18313 evicted_count=1000 eviction_rate=0.054606 and unsatisfied allocation rate=0
Epoch 1: 30%|##################8 | 50/167 [10:35<24:47, 12.71s/it, acc=24.3, loss=262, step=50]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 43740 get requests, put_count=48876 evicted_count=4000 eviction_rate=0.0818398 and unsatisfied allocation rate=0
Epoch 1: 99%|##############################################################6| 166/167 [32:06<00:11, 11.61s/it, acc=32, loss=nan, step=166]wait!
Epoch 1: 100%|###############################################################| 167/167 [32:17<00:00, 11.60s/it, acc=32, loss=nan, step=167]
Epoch 2: 13%|########4 | 22/167 [03:51<25:23, 10.51s/it, acc=32, loss=nan, step=189]I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 28810067 get requests, put_count=28811000 evicted_count=2000 eviction_rate=6.94179e-05 and unsatisfied allocation rate=9.47585e-05
Epoch 2: 99%|##############################################################6| 166/167 [29:29<00:10, 10.66s/it, acc=32, loss=nan, step=333]wait!
Epoch 2: 100%|###############################################################| 167/167 [29:40<00:00, 10.66s/it, acc=32, loss=nan, step=334
Did you encounter this problem?
The text was updated successfully, but these errors were encountered: