Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/' #489

OliviaLynn · 2024-11-01T19:57:01Z

Bug report
I encounter unexpected output in the second code cell of the Topic: Manual catalog verification demo.

I would expect to get (as in the doc):

Validating catalog at path https://data.lsdb.io/hats/gaia_dr3/gaia/ ...
Found 3933 partitions.
Approximate coverage is 100.00 % of the sky.
True

Instead, I get:

{
	"name": "ArrowInvalid",
	"message": "Error creating dataset. Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/'. Is this a 'parquet' file?: Could not open Parquet input source 'https://data.lsdb.io/hats/gaia_dr3/gaia/': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.",
	"stack": "---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 is_valid_catalog(gaia_catalog_path, verbose=True, fail_fast=True, strict=True)

File ~/.local/lib/python3.11/site-packages/hats/io/validation.py:125, in is_valid_catalog(pointer, strict, fail_fast, verbose)
    113 ignore_prefixes = [
    114     \"_common_metadata\",
    115     \"_metadata\",
   (...)
    121     \"README\",
    122 ]
    124 # As a side effect, this confirms that we can load the directory as a valid dataset.
--> 125 (dataset_path, dataset) = read_parquet_dataset(
    126     pointer,
    127     ignore_prefixes=ignore_prefixes,
    128     exclude_invalid_files=False,
    129 )
    131 parquet_path_pixels = []
    132 for hats_file in dataset.files:

File ~/.local/lib/python3.11/site-packages/hats/io/file_io/file_io.py:190, in read_parquet_dataset(source, **kwargs)
    187     file_system = source.fs
    188     source = source.path
--> 190 dataset = pds.dataset(
    191     source,
    192     filesystem=file_system,
    193     format=\"parquet\",
    194     **kwargs,
    195 )
    196 return (str(source), dataset)

File ~/.conda/envs/lsdb/lib/python3.11/site-packages/pyarrow/dataset.py:794, in dataset(source, schema, format, filesystem, partitioning, partition_base_dir, exclude_invalid_files, ignore_prefixes)
    783 kwargs = dict(
    784     schema=schema,
    785     filesystem=filesystem,
   (...)
    790     selector_ignore_prefixes=ignore_prefixes
    791 )
    793 if _is_path_like(source):
--> 794     return _filesystem_dataset(source, **kwargs)
    795 elif isinstance(source, (tuple, list)):
    796     if all(_is_path_like(elem) or isinstance(elem, FileInfo) for elem in source):

File ~/.conda/envs/lsdb/lib/python3.11/site-packages/pyarrow/dataset.py:486, in _filesystem_dataset(source, schema, filesystem, partitioning, format, partition_base_dir, exclude_invalid_files, selector_ignore_prefixes)
    478 options = FileSystemFactoryOptions(
    479     partitioning=partitioning,
    480     partition_base_dir=partition_base_dir,
    481     exclude_invalid_files=exclude_invalid_files,
    482     selector_ignore_prefixes=selector_ignore_prefixes
    483 )
    484 factory = FileSystemDatasetFactory(fs, paths_or_selector, format, options)
--> 486 return factory.finish(schema)

File ~/.conda/envs/lsdb/lib/python3.11/site-packages/pyarrow/_dataset.pyx:3126, in pyarrow._dataset.DatasetFactory.finish()

File ~/.conda/envs/lsdb/lib/python3.11/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File ~/.conda/envs/lsdb/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowInvalid: Error creating dataset. Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/'. Is this a 'parquet' file?: Could not open Parquet input source 'https://data.lsdb.io/hats/gaia_dr3/gaia/': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file."
}

I am running this in a Python notebook on USDF. I'm using lsdb via pip install 'lsdb[full]'.

Maybe I've missed an installation step? The previous cell (and all previous tutorial notebooks) seem to run fine.

The text was updated successfully, but these errors were encountered:

delucchi-cmu · 2024-11-01T20:07:46Z

Yes - this has been addressed in astronomy-commons/hats#404, but this has not been released. Does this still occur if you install hats from current main?

nevencaplar · 2024-11-07T18:04:01Z

@OliviaLynn Can you confirm if this is solved?

OliviaLynn · 2024-11-14T21:09:43Z

Apologies--I've had GitHub notifications off. I just checked and it works!

OliviaLynn added the bug Something isn't working label Nov 1, 2024

delucchi-cmu added this to HATS / LSDB Nov 1, 2024

nevencaplar moved this to In Progress in HATS / LSDB Nov 8, 2024

OliviaLynn closed this as completed Nov 14, 2024

github-project-automation bot moved this from In Progress to Done in HATS / LSDB Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/' #489

Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/' #489

OliviaLynn commented Nov 1, 2024

delucchi-cmu commented Nov 1, 2024

nevencaplar commented Nov 7, 2024

OliviaLynn commented Nov 14, 2024 •

edited

Loading

Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/' #489

Could not read schema from 'https://data.lsdb.io/hats/gaia_dr3/gaia/' #489

Comments

OliviaLynn commented Nov 1, 2024

delucchi-cmu commented Nov 1, 2024

nevencaplar commented Nov 7, 2024

OliviaLynn commented Nov 14, 2024 • edited Loading

OliviaLynn commented Nov 14, 2024 •

edited

Loading