Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve catalog validation and column statistics #404

Merged
merged 8 commits into from
Oct 31, 2024

Conversation

delucchi-cmu
Copy link
Contributor

In implementing astronomy-commons/lsdb#392, I ran into several issues with the current implementation of validation and column statistics:

  • parquet dataset not being loaded for HTTP catalog
  • column statistics method on the catalog object
  • more extensive testing of validation
  • handling non-healpix-datasets (e.g. index tables) in validation logic.

Copy link

github-actions bot commented Oct 31, 2024

Before [7783fbf] After [907752c] Ratio Benchmark (Parameter)
90.7±0.7ms 92.8±0.9ms 1.02 benchmarks.MetadataSuite.time_load_partition_join_info
13.1±0.6ms 13.3±0.9ms 1.02 benchmarks.Suite.time_inner_pixel_alignment
22.4±0.2ms 22.7±0.2ms 1.01 benchmarks.MetadataSuite.time_load_partition_info_order6
1.01±0.04ms 1.02±0.03ms 1.01 benchmarks.time_test_cone_filter_multiple_order
381±3ms 381±2ms 1 benchmarks.Suite.time_outer_pixel_alignment
45.3±0.7ms 45.4±0.7ms 1 benchmarks.Suite.time_pixel_tree_creation
93.3±0.8ms 92.4±1ms 0.99 benchmarks.MetadataSuite.time_load_partition_info_order7
110±1ms 109±1ms 0.99 benchmarks.Suite.time_paths_creation
124±1ms 122±3ms 0.98 benchmarks.time_test_alignment_even_sky

Click here to view all benchmarks.

@delucchi-cmu delucchi-cmu requested a review from hombit October 31, 2024 16:54
Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 97.61905% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.69%. Comparing base (7783fbf) to head (00fdca9).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/hats/io/validation.py 97.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
+ Coverage   92.63%   92.69%   +0.05%     
==========================================
  Files          49       49              
  Lines        2092     2095       +3     
==========================================
+ Hits         1938     1942       +4     
+ Misses        154      153       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

src/hats/catalog/dataset/dataset.py Show resolved Hide resolved
src/hats/catalog/dataset/dataset.py Show resolved Hide resolved
src/hats/catalog/dataset/dataset.py Show resolved Hide resolved
src/hats/io/validation.py Outdated Show resolved Hide resolved
src/hats/io/validation.py Outdated Show resolved Hide resolved
@delucchi-cmu delucchi-cmu merged commit 8e1d456 into main Oct 31, 2024
11 checks passed
@delucchi-cmu delucchi-cmu deleted the delucchi/inspection_notebook branch October 31, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants