Problems interacting with 'moto' package to mock S3 filesystems #1074

jameswilburlewis · 2025-01-16T21:16:35Z

One of the MAVEN uri_tests is consistently failing, and I'm not sure why.

    def test_load_mag_byorbit_data(self):
        config.CONFIG["local_data_dir"] = f"s3://{bucket_name}"
        #saved_logging_level = pyspedas.logger.getEffectiveLevel()
        #pyspedas.logger.setLevel(logging.DEBUG)
        data = maven.mag(trange=[500, 501], datatype="ss1s")
        self.assertTrue(len(tplot_names("OB_B*"))>0)
        #pyspedas.logger.setLevel(saved_logging_level)
        time.sleep(sleep_time)

This test loads a bunch of MAVEN orbit files from NAIF, merges them into a single file, then reads the merged file to map orbit numbers to dates (eventually requesting mag data for a time range specified as orbit numbers).

I'm seeing different sorts of failures on Github (ubuntu, Python 3.12) and my laptop (Mac M2, Python 3.9).

On Github, the merge_orbit_files routine seems to stall partway through:

    pattern = "maven_orb_rec(_|)(|.{6})(|_.{9}).orb"
    orb_dates = []
    orb_files = []
    for f in fl:
        x = re.match(pattern, f)
        if x is not None:
            orb_file = os.path.join(orbit_files_path, f) if not is_fsspec_uri(toolkit_path) else "/".join([orbit_files_path, f])
            orb_files.append(orb_file)
            if x.group(2) != "":
                orb_dates.append(x.group(2))
            else:
                orb_dates.append("999999")

    sorted_files = [x for (y, x) in sorted(zip(orb_dates, orb_files))]

    with fo as code:
        skip_2_lines = False
        for o_file in sorted_files:
            logging.info("merge_orbit_files processing file %s", o_file)
            if is_fsspec_uri(toolkit_path):
                # assumes fsspec filesystem triggered above
                fo_file = fs.open(o_file)
            else:
                fo_file = open(o_file)
            with fo_file as f:
                if skip_2_lines:
                    f.readline()
                    f.readline()
                skip_2_lines = True
                content=f.read()
                logging.info("writing %d bytes to output file %s",len(content),output_filename)
                if type(content) is bytes:
                    code.write(str(content))
                else:
                    code.write(content)

From the test log:

16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_201001_210101_v1.orb
16-Jan-25 08:00:21: writing 82008 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210101_210401_v1.orb
16-Jan-25 08:00:21: writing 80266 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210401_210701_v1.orb
16-Jan-25 08:00:21: writing 81204 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210701_211001_v1.orb
ERROR

It seems to hang while reading the last listed orbit file (which is probably about halfway through the list). Nothing else happens after "ERROR" is printed, until Github kills the action after 6 hours of elapsed time.

On my Mac, the symptom is slightly different. It appears to make it though the complete list of orbit files to make the merged file, but
then while trying to read it, it's apparently empty.

16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_240701_241001_v1.orb
16-Jan-25 00:35:36: writing 80936 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec.orb
16-Jan-25 00:35:36: writing 85894 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: Getting orbit info from file s3://test-bucket/orbitfiles/merged_maven_orbits.orb

Error
Traceback (most recent call last):
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/tests/uri_tests.py", line 439, in test_load_mag_byorbit_data
    data = maven.mag(trange=[500, 501], datatype="ss1s")
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/mag.py", line 76, in mag
    return maven_load(
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 350, in load_data
    maven_files = maven_filenames(
  File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 80, in maven_filenames
    start_date = parse(start_date)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 640, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 719, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 201, in split
    return list(cls(s))
  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 69, in __init__
    raise TypeError('Parser must be a string or character stream, not '
TypeError: Parser must be a string or character stream, not NoneType

As far as I can tell stepping through with a debugger, the merged_maven_orbits.orb file appears to be empty when it's opened in orbit_time.py:

    logging.info("Getting orbit info from file %s", orb_file)
    if is_fsspec_uri(toolkit_path):
        protocol, path = toolkit_path.split("://")
        fs = fsspec.filesystem(protocol)

        fileobj = fs.open(orb_file, "r")
    else:
        fileobj = open(orb_file, "r")

    with fileobj as f:
        if end_orbit is None:
            end_orbit = begin_orbit
        orbit_num = []
        time = []
        f.readline()
        f.readline()
        for line in f:
            line = line[0:28]
            line = line.split(" ")
            line = [x for x in line if x != ""]

If I set a breakpoint just after it enters the "with fileobj as f" block, and do

content=f.read()

at the console, I get an empty string.

This all works fine when not using an S3 url for local_data_dir.

The text was updated successfully, but these errors were encountered:

jameswilburlewis added bug Something isn't working QA/Testing python Issues involving Python and Python-related tools outside of pyspedas heliocloud MAVEN Data Servers Issues with remote data servers (SPDF, MMS, MAVEN, JAXA, etc) labels Jan 16, 2025

jameswilburlewis assigned jameswilburlewis and edmondb Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems interacting with 'moto' package to mock S3 filesystems #1074

Problems interacting with 'moto' package to mock S3 filesystems #1074

jameswilburlewis commented Jan 16, 2025

Problems interacting with 'moto' package to mock S3 filesystems #1074

Problems interacting with 'moto' package to mock S3 filesystems #1074

Comments

jameswilburlewis commented Jan 16, 2025