Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems interacting with 'moto' package to mock S3 filesystems #1074

Open
jameswilburlewis opened this issue Jan 16, 2025 · 0 comments
Open
Assignees
Labels
bug Something isn't working Data Servers Issues with remote data servers (SPDF, MMS, MAVEN, JAXA, etc) heliocloud MAVEN python Issues involving Python and Python-related tools outside of pyspedas QA/Testing

Comments

@jameswilburlewis
Copy link
Contributor

One of the MAVEN uri_tests is consistently failing, and I'm not sure why.

    def test_load_mag_byorbit_data(self):
        config.CONFIG["local_data_dir"] = f"s3://{bucket_name}"
        #saved_logging_level = pyspedas.logger.getEffectiveLevel()
        #pyspedas.logger.setLevel(logging.DEBUG)
        data = maven.mag(trange=[500, 501], datatype="ss1s")
        self.assertTrue(len(tplot_names("OB_B*"))>0)
        #pyspedas.logger.setLevel(saved_logging_level)
        time.sleep(sleep_time)

This test loads a bunch of MAVEN orbit files from NAIF, merges them into a single file, then reads the merged file to map orbit numbers to dates (eventually requesting mag data for a time range specified as orbit numbers).

I'm seeing different sorts of failures on Github (ubuntu, Python 3.12) and my laptop (Mac M2, Python 3.9).

On Github, the merge_orbit_files routine seems to stall partway through:

    pattern = "maven_orb_rec(_|)(|.{6})(|_.{9}).orb"
    orb_dates = []
    orb_files = []
    for f in fl:
        x = re.match(pattern, f)
        if x is not None:
            orb_file = os.path.join(orbit_files_path, f) if not is_fsspec_uri(toolkit_path) else "/".join([orbit_files_path, f])
            orb_files.append(orb_file)
            if x.group(2) != "":
                orb_dates.append(x.group(2))
            else:
                orb_dates.append("999999")

    sorted_files = [x for (y, x) in sorted(zip(orb_dates, orb_files))]

    with fo as code:
        skip_2_lines = False
        for o_file in sorted_files:
            logging.info("merge_orbit_files processing file %s", o_file)
            if is_fsspec_uri(toolkit_path):
                # assumes fsspec filesystem triggered above
                fo_file = fs.open(o_file)
            else:
                fo_file = open(o_file)
            with fo_file as f:
                if skip_2_lines:
                    f.readline()
                    f.readline()
                skip_2_lines = True
                content=f.read()
                logging.info("writing %d bytes to output file %s",len(content),output_filename)
                if type(content) is bytes:
                    code.write(str(content))
                else:
                    code.write(content)

From the test log:

16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_201001_210101_v1.orb
16-Jan-25 08:00:21: writing 82008 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210101_210401_v1.orb
16-Jan-25 08:00:21: writing 80266 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210401_210701_v1.orb
16-Jan-25 08:00:21: writing 81204 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210701_211001_v1.orb
ERROR

It seems to hang while reading the last listed orbit file (which is probably about halfway through the list). Nothing else happens after "ERROR" is printed, until Github kills the action after 6 hours of elapsed time.

On my Mac, the symptom is slightly different. It appears to make it though the complete list of orbit files to make the merged file, but
then while trying to read it, it's apparently empty.

16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_240701_241001_v1.orb
16-Jan-25 00:35:36: writing 80936 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec.orb
16-Jan-25 00:35:36: writing 85894 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: Getting orbit info from file s3://test-bucket/orbitfiles/merged_maven_orbits.orb

Error
Traceback (most recent call last):
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/tests/uri_tests.py", line 439, in test_load_mag_byorbit_data
    data = maven.mag(trange=[500, 501], datatype="ss1s")
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/mag.py", line 76, in mag
    return maven_load(
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 350, in load_data
    maven_files = maven_filenames(
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 80, in maven_filenames
    start_date = parse(start_date)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 640, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 719, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 201, in split
    return list(cls(s))
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 69, in __init__
    raise TypeError('Parser must be a string or character stream, not '
TypeError: Parser must be a string or character stream, not NoneType

As far as I can tell stepping through with a debugger, the merged_maven_orbits.orb file appears to be empty when it's opened in orbit_time.py:

    logging.info("Getting orbit info from file %s", orb_file)
    if is_fsspec_uri(toolkit_path):
        protocol, path = toolkit_path.split("://")
        fs = fsspec.filesystem(protocol)

        fileobj = fs.open(orb_file, "r")
    else:
        fileobj = open(orb_file, "r")

    with fileobj as f:
        if end_orbit is None:
            end_orbit = begin_orbit
        orbit_num = []
        time = []
        f.readline()
        f.readline()
        for line in f:
            line = line[0:28]
            line = line.split(" ")
            line = [x for x in line if x != ""]

If I set a breakpoint just after it enters the "with fileobj as f" block, and do

content=f.read()

at the console, I get an empty string.

This all works fine when not using an S3 url for local_data_dir.

@jameswilburlewis jameswilburlewis added bug Something isn't working QA/Testing python Issues involving Python and Python-related tools outside of pyspedas heliocloud MAVEN Data Servers Issues with remote data servers (SPDF, MMS, MAVEN, JAXA, etc) labels Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Data Servers Issues with remote data servers (SPDF, MMS, MAVEN, JAXA, etc) heliocloud MAVEN python Issues involving Python and Python-related tools outside of pyspedas QA/Testing
Projects
None yet
Development

No branches or pull requests

2 participants