Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Dec 4, 2024

Description of proposed changes

This PR adds the pygmt.read function to read any recognized data files (currently dataset, grid, or image) into a pandas.DataFrame/xarray.DataArray object.

The new read function can replace most load_dataarray/xr.open_dataarray/xr.load_dataarray calls.

Related to #3643 (comment).

Preview: https://pygmt-dev--3673.org.readthedocs.build/en/3673/api/generated/pygmt.read.html

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

@seisman seisman added feature Brand new feature needs review This PR has higher priority and needs review. and removed needs review This PR has higher priority and needs review. labels Dec 4, 2024
@seisman seisman force-pushed the feature/read branch 2 times, most recently from cac7d74 to c50232e Compare December 4, 2024 10:18
@seisman seisman marked this pull request as draft December 5, 2024 03:23
@seisman seisman added this to the 0.14.0 milestone Dec 9, 2024
@seisman seisman marked this pull request as ready for review December 9, 2024 09:47
@seisman seisman added the needs review This PR has higher priority and needs review. label Dec 9, 2024
raise ValueError(msg)

kwdict = {
"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region, # type: ignore[union-attr]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is used here to avoid using the kwargs_to_string, use_alias decorators:

"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region

@seisman
Copy link
Member Author

seisman commented Dec 9, 2024

In the _load_remote_dataset function, we can't replace the following codes with the new read function, because in read, we call which to get the full path of the source grid, which doesn't work well for tiled grids.

fname = f"@{prefix}_{resolution}_{reg}"
kind = "image" if name in {"earth_day", "earth_night"} else "grid"
kwdict = {"R": region, "T": {"grid": "g", "image": "i"}[kind]}
with Session() as lib:
with lib.virtualfile_out(kind=kind) as voutgrd:
lib.call_module(
module="read",
args=[fname, voutgrd, *build_arg_list(kwdict)],
)
grid = lib.virtualfile_to_raster(kind=kind, outgrid=None, vfname=voutgrd)
# Full path to the grid if not tiled grids.
source = which(fname, download="a") if not resinfo.tiled else None
# Manually add source to xarray.DataArray encoding to make the GMT accessors work.
if source:
grid.encoding["source"] = source

@seisman
Copy link
Member Author

seisman commented Dec 9, 2024

Now, the load_dataarray function is used in pygmt/src/grdcut.py only (related to #3115).

xr.open_dataarray is used in test_accessors.py.

A list of column names.
header
Row number containing column names. ``header=None`` means not to parse the
column names from table header. Ignored if the row number is larger than the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
column names from table header. Ignored if the row number is larger than the
column names from the table header. Ignored if the row number is larger than the

header
Row number containing column names. ``header=None`` means not to parse the
column names from table header. Ignored if the row number is larger than the
number of headers in the table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
number of headers in the table.
number of header lines in the table.

@seisman seisman mentioned this pull request Dec 19, 2024
49 tasks
@seisman seisman mentioned this pull request Dec 19, 2024
7 tasks
Comment on lines +102 to +111
case "dataset":
return lib.virtualfile_to_dataset(
vfname=voutfile,
column_names=column_names,
header=header,
dtype=dtype,
index_col=index_col,
)
case "grid" | "image":
raster = lib.virtualfile_to_raster(vfname=voutfile, kind=kind)
Copy link
Member

@weiji14 weiji14 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debating on whether we should have a low-level clib read that reads into a GMT virtualfile, and a high-level read that wraps around that to do both read + convert virtualfile to a pandas.DataFrame or xarray.DataArray.

Comment on lines 174 to +175
load_dataarray
read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load_dataarray function was put under the pygmt.io namespace. Should we consider putting read under pygmt.io too? (Thinking about whether we need a low-level pygmt.clib.read and high-level pygmt.io.read in my other comment).

@seisman seisman removed this from the 0.14.0 milestone Dec 20, 2024
@seisman seisman removed the needs review This PR has higher priority and needs review. label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants