Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout when using era5grib #9

Open
bschroeter opened this issue Oct 31, 2021 · 5 comments
Open

Timeout when using era5grib #9

bschroeter opened this issue Oct 31, 2021 · 5 comments

Comments

@bschroeter
Copy link

Hi there, so this has happened to me a few times and I've been struggling to find a workaround.

When using the era5grib utility to acquire data for WRF I end up with the following error.

....
Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/deploy/cluster.py", line 99, in _sync_cluster_info
    await self.scheduler_comm.set_metadata(
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/core.py", line 796, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/core.py", line 753, in live_comm
    comm = await connect(
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/comm/core.py", line 307, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:37637 after 30 s

It seems that the distributed scheduler dies while trying to write out the file and happens with both netcdf and grid output.

Any thoughts?

@ScottWales
Copy link

Are you running on the login or a compute node?

@bschroeter
Copy link
Author

Compute node, interactively. Heaps of resources.

@bschroeter
Copy link
Author

As per slack chat, disabling era5land with --no-era5land appears to make something happen.

@ScottWales
Copy link

Appears to be a bug in xesmf's most recent version, which we installed when Conda got updated. I will try downgrading xesmf in Conda to see if that improves performance.

pangeo-data/xESMF#127

@ScottWales
Copy link

Appears to now be working for me with your inputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants