Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird Type error when running meta-dataset.train to run the meta_dataset/learn/gin/default/baseline_cosine_imagenet.gin configuration file #66

Open
brenowca opened this issue May 11, 2021 · 4 comments

Comments

@brenowca
Copy link

Hi I am trying to run the baseline cosine method on ImageNet but I got the following mysterious error?

`TypeError: 'int' object is not subscriptable
  In call to configurable 'four_layer_convnet' (<function four_layer_convnet at 0x7f55794b99d0>)
  In call to configurable 'Trainer' (<class 'meta_dataset.trainer.Trainer'>)`

I downloaded the ImageNet dataset and converted it to records as described in this instruction file.

Could you please help me find what is going wrong?

My script call:

python -m meta_dataset.train  \
    --train_checkpoint_dir=brenow/bench \
    --summary_dir=brenow/bench \
    --records_root_dir=brenow/multipletasklearning/datasets/meta_dataset/records/ \
    --alsologtostderr --gin_config=meta_dataset/learn/gin/default/baseline_cosine_imagenet.gin \
    --gin_bindings="Trainer.experiment_name='baseline_cosine_imagenet'"

The entire error log:

2021-05-11 18:14:11.298088: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:From brenow/multipletasklearning/meta-dataset/meta_dataset/models/experimental/reparameterizable_backbones.py:39: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.

I0511 18:14:24.676703 140008958826304 trainer.py:893] Adding dataset ilsvrc_2012
I0511 18:14:24.677727 140008958826304 trainer.py:918] Episodes for split valid will be created from ['ilsvrc_2012']
I0511 18:14:24.677808 140008958826304 trainer.py:918] Episodes for split train will be created from ['ilsvrc_2012']
I0511 18:14:33.676376 140008958826304 api.py:461] batch augmentations:
I0511 18:14:34.871964 140008958826304 api.py:461] enable_jitter: True
I0511 18:14:34.879687 140008958826304 api.py:461] jitter_amount: 0
I0511 18:14:34.887233 140008958826304 api.py:461] enable_gaussian_noise: True
I0511 18:14:34.895139 140008958826304 api.py:461] gaussian_noise_std: 0.0
WARNING:tensorflow:From brenow/multipletasklearning/meta-dataset/meta_dataset/data/pipeline.py:355: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0511 18:14:34.901691 140008958826304 deprecation.py:531] From brenow/multipletasklearning/meta-dataset/meta_dataset/data/pipeline.py:355: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
2021-05-11 18:14:35.180736: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-11 18:14:35.184715: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-05-11 18:14:35.236975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:1b:00.0 name: Tesla K40m computeCapability: 3.5
coreClock: 0.745GHz coreCount: 15 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 268.58GiB/s
2021-05-11 18:14:35.237754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: 
pciBusID: 0000:86:00.0 name: Tesla K40m computeCapability: 3.5
coreClock: 0.745GHz coreCount: 15 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 268.58GiB/s
2021-05-11 18:14:35.237800: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-11 18:14:35.242303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-11 18:14:35.242358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-11 18:14:35.243427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-11 18:14:35.244122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-11 18:14:35.244313: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ibm/lsfsuite/ext/ppm/10.2/linux2.6-glibc2.3-x86_64/lib:/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/lib
2021-05-11 18:14:35.245100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-11 18:14:35.245258: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ibm/lsfsuite/ext/ppm/10.2/linux2.6-glibc2.3-x86_64/lib:/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/lib
2021-05-11 18:14:35.245285: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-11 18:14:35.246996: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-11 18:14:35.247044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-11 18:14:35.247056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
WARNING:tensorflow:From brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py:2560: calling DatasetV2.from_generator (from tensorflow.python.data.ops.dataset_ops) with output_types is deprecated and will be removed in a future version.
Instructions for updating:
Use output_signature instead
W0511 18:14:37.684212 140008958826304 deprecation.py:531] From brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py:2560: calling DatasetV2.from_generator (from tensorflow.python.data.ops.dataset_ops) with output_types is deprecated and will be removed in a future version.
Instructions for updating:
Use output_signature instead
WARNING:tensorflow:From brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py:2560: calling DatasetV2.from_generator (from tensorflow.python.data.ops.dataset_ops) with output_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Use output_signature instead
W0511 18:14:37.684439 140008958826304 deprecation.py:531] From brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py:2560: calling DatasetV2.from_generator (from tensorflow.python.data.ops.dataset_ops) with output_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Use output_signature instead
I0511 18:14:37.938055 140008958826304 api.py:461] support augmentations:
I0511 18:14:37.945658 140008958826304 api.py:461] enable_jitter: True
I0511 18:14:37.953278 140008958826304 api.py:461] jitter_amount: 0
I0511 18:14:37.961421 140008958826304 api.py:461] enable_gaussian_noise: True
I0511 18:14:37.969164 140008958826304 api.py:461] gaussian_noise_std: 0.0
I0511 18:14:37.977154 140008958826304 api.py:461] query augmentations:
I0511 18:14:37.984944 140008958826304 api.py:461] enable_jitter: False
I0511 18:14:37.992632 140008958826304 api.py:461] jitter_amount: 0
I0511 18:14:38.000089 140008958826304 api.py:461] enable_gaussian_noise: False
I0511 18:14:38.007490 140008958826304 api.py:461] gaussian_noise_std: 0.0
Traceback (most recent call last):
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/train.py", line 273, in <module>
    app.run(program)
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/train.py", line 210, in main
    trainer_instance = trainer.Trainer(
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/trainer.py", line 562, in __init__
    output = self.run_fns[split](data_tensors)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/trainer.py", line 1325, in run_fn_with_train_op
    res = run_fn(data)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/trainer.py", line 695, in run
    predictions_dist = self.learners[split].forward_pass(data_local)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/learners/baseline_learners.py", line 73, in forward_pass
    embeddings_params_moments = self.embedding_fn(images, self.is_training)
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "brenow/miniconda3/envs/multipletasklearning/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/models/functional_backbones.py", line 953, in four_layer_convnet
    return _four_layer_convnet(
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/models/functional_backbones.py", line 880, in _four_layer_convnet
    layer, conv_bn_params, conv_bn_moments = conv_bn(
  File "brenow/multipletasklearning/meta-dataset/meta_dataset/models/functional_backbones.py", line 369, in conv_bn
    depth[0],
TypeError: 'int' object is not subscriptable
  In call to configurable 'four_layer_convnet' (<function four_layer_convnet at 0x7f55794b99d0>)
  In call to configurable 'Trainer' (<class 'meta_dataset.trainer.Trainer'>)
@brenowca brenowca changed the title Weird Type error when running meta-dataset.train for using the meta_dataset/learn/gin/default/baseline_cosine_imagenet.gin file Weird Type error when running meta-dataset.train to run the meta_dataset/learn/gin/default/baseline_cosine_imagenet.gin configuration file May 11, 2021
@brenowca
Copy link
Author

Obs.: I got the same error message when running on GPU or CPU

@brenowca
Copy link
Author

My $DATASRC/records/ilsvrc_2012/dataset_spec.json file matches most of the original ilsvrc_2012_dataset_spec.json file in the repository.

The only difference is the training split that seems to have being generated at random:

image

@brenowca
Copy link
Author

brenowca commented May 11, 2021

ilsvrc_json_diff_ilsvrc_original.txt
This is the complete diff file, just for reference.

Oh, I had the same issue using the crosstransformer_simclreps_imagenet.gin configuration file

@brenowca
Copy link
Author

Hi team, just a quick update on this issue:

I checked out an older commit, 0c8a9bb, and this error disappeared.

This is the following git checkout command that I ran:
git checkout 0c8a9bb

I chose this one because it was the last commit I knew for sure that where able to run the CTX code since another person was able to create a checkpoint for CTX using it. Check issue #58 for the mentioned checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant