Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LineCollection plot #7173

Open
wants to merge 128 commits into
base: main
Choose a base branch
from
Open

Add LineCollection plot #7173

wants to merge 128 commits into from

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Oct 16, 2022

This adds a line plotter based on LineCollections, called .lines.

I wanted to replace darray.plot() with using LineCollection instead. But unfortunately due to how many cases are supported (and tested in xarray) darray.plot() will continue using plt.plot.

Examples:

https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Got this working with xarray:

# https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/#lightbox
# https://github.com/afinemax/climate_change_bot/blob/main/bot.py


import requests

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr


def download_json(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for unsuccessful responses (4xx, 5xx)
        json_data = response.json()
        return json_data
    except requests.exceptions.RequestException as e:
        print(f"Error downloading JSON: {e}")
        return None


url = r"https://climatereanalyzer.org/clim/sst_daily/json/oisst2.1_natlan1_sst_day.json"
sst_daily_data = download_json(url)


data = []
years = []
for i in range(len(sst_daily_data) - 3):
    years.append((sst_daily_data[i]["name"]))
    data.append(sst_daily_data[i]["data"])

# convert data from list into nd arr
data = np.asarray(data, dtype=float)  # first index is year, 2nd index is day

# Datetimes not working for some unrelated reason:
# ds = xr.Dataset(
#     data_vars=dict(temperature=(("year", "day"), data)),
#     coords=dict(
#         year=np.array(dict_names, dtype="datetime64[D]"),
#         day=np.arange(data.shape[-1], dtype="timedelta64[D]"),
#     ),
# )
ds = xr.Dataset(
    data_vars=dict(
        temperature=(
            ("year", "day"),
            data,
            dict(
                long_name="Daily North Atlantic (0-60N) Sea Surface Temperature (SST)",
                units="degC",
            ),
        )
    ),
    coords=dict(
        year=np.array(years, dtype=int),
        day=np.arange(data.shape[-1], dtype=int),
    ),
)

plt.figure()
ds.plot.lines(x="day", y="temperature", hue="year")

image

Calling it lines since scatter is not called path_collection:

  • PathCollection (Very low-level, e.g. no datetime handling) -> plt.scatter (Handles datetime etc.)-> xr.plot.scatter
  • LineCollection (Very low-level, e.g. no datetime handling) -> _line (Handles datetime etc. Pretty much a copy of plt.scatter) -> xr.plot.lines

Seaborns new object-based plotting uses lines as well with similar argument:
https://seaborn.pydata.org/generated/seaborn.objects.Line.html
https://seaborn.pydata.org/generated/seaborn.objects.Lines.html

xref:
#4820
#5622

@Illviljan
Copy link
Contributor Author

Illviljan commented Oct 17, 2022

Scatter vs. Lines:

image

ds = xr.tutorial.scatter_example_dataset(seed=42)
hue_ = "y"
x_ = "y"
size_="y"
z_ = "z"

fig = plt.figure()
ax = fig.add_subplot(1, 2, 1, projection='3d')
ds.A.sel(w="one").plot.lines(x=x_, z=z_, hue=hue_, linewidth=size_, ax=ax)
ax = fig.add_subplot(1, 2, 2, projection='3d')
ds.A.sel(w="one").plot.scatter(x=x_, z=z_, hue=hue_, markersize=size_, ax=ax)

@Illviljan
Copy link
Contributor Author

For my taste it uses a bit too much private matplotlib API, hopefully that doesn't make things complicated in the long run. Maybe you can ask over at matplotlib to make some of these things public API?

I agree, but it should work fine to reference plt.scatter if anything breaks. The code follows scatter very closely besides for 3d, having to make mypy happy and the black code style.

In my dream world matplotlib would add a similar function. plt.scatter is preferred over PathCollection and the same arguments can be used for adding a plt.lines over using LineCollection.

@Illviljan
Copy link
Contributor Author

Illviljan commented May 30, 2024

Here's an interesting plot we should be easily be able to do with Lines:
image
https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Would probably require fixing #8831 to get that nice looking datetime colorbar.

Haven't found the data yet, noting it for the future.

@headtr1ck
Copy link
Collaborator

Add an entry in whats-new and this is good to go! Thanks!

@headtr1ck headtr1ck added the plan to merge Final call for comments label Jun 18, 2024
@dcherian
Copy link
Contributor

dcherian commented Jun 18, 2024

This looks nice. I have two requests:

  1. Can we add some info to the docstring about the difference between lines and line and how to choose between the two?
  2. Can we please consider calling it line_collection instead? The one character difference between two very similar but slightly different functions is not friendly to users.

On second thought, there is indeed a lot of private API here. Are you sure this is a good idea? It feels like major technical debt. Perhaps this is best suited to an external MPL extension?

@Illviljan
Copy link
Contributor Author

Can we add some info to the docstring about the difference between lines and line and how to choose between the two?

Done, I think?

Can we please consider calling it line_collection instead? The one character difference between two very similar but slightly different functions is not friendly to users.

Sure I don't mind a different name, but line_collection is also misleading, mpl's LineCollection and PathCollection is as basic as it gets and there's a lot of functionality xarray must have (datetime support for example) that they don't support. Therefore I was thinking to follow mpl's direction with a more distinct name, as scatter/_line do quite a bit more than just call the Collection:

PathCollection -> plt.scatter -> xr.plot.scatter
LineCollection -> _line -> xr.plot.?

On second thought, there is indeed a lot of private API here. Are you sure this is a good idea? It feels like major technical debt. Perhaps this is best suited to an external MPL extension?

_line is as close to a copy/paste of plt.scatter as it can be right now. Sure maybe I can rewrite it in a more public way, but then it'll be harder to use plt.scatter as a reference, which probably will be needed whenever someone finds a bug.

I think 3d plotting is the sort of thing that xarray should be good at, with scatter and lines a lot of the base needs should be covered.

@headtr1ck
Copy link
Collaborator

I'm fine with leaving it like this because currently there is really no nice way of rewriting it in a more "public" way.

@dcherian you can also discuss this in the bi-weekly if you feel like it.

@Illviljan
Copy link
Contributor Author

Illviljan commented Jul 9, 2024

https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Got this working with xarray:

# https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/#lightbox
# https://github.com/afinemax/climate_change_bot/blob/main/bot.py


import requests

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr


def download_json(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for unsuccessful responses (4xx, 5xx)
        json_data = response.json()
        return json_data
    except requests.exceptions.RequestException as e:
        print(f"Error downloading JSON: {e}")
        return None


url = r"https://climatereanalyzer.org/clim/sst_daily/json/oisst2.1_natlan1_sst_day.json"
sst_daily_data = download_json(url)


data = []
years = []
for i in range(len(sst_daily_data) - 3):
    years.append((sst_daily_data[i]["name"]))
    data.append(sst_daily_data[i]["data"])

# convert data from list into nd arr
data = np.asarray(data, dtype=float)  # first index is year, 2nd index is day

# ds = xr.Dataset(
#     data_vars=dict(temperature=(("year", "day"), data)),
#     coords=dict(
#         year=np.array(dict_names, dtype="datetime64[D]"),
#         day=np.arange(data.shape[-1], dtype="timedelta64[D]"),
#     ),
# )
ds = xr.Dataset(
    data_vars=dict(
        temperature=(
            ("year", "day"),
            data,
            dict(
                long_name="Daily North Atlantic (0-60N) Sea Surface Temperature (SST)",
                units="degC",
            ),
        )
    ),
    coords=dict(
        year=np.array(years, dtype=int),
        day=np.arange(data.shape[-1], dtype=int),
    ),
)

plt.figure()
ds.plot.lines(x="day", y="temperature", hue="year")

image

I couldn't get it working with datetimes though, but is unrelated to this pr.

@headtr1ck
Copy link
Collaborator

Some want to merge this finally?
Or is there some discussion still required?

@Illviljan
Copy link
Contributor Author

Felt the need for this one again. I'm going to self-merge next week unless someone has any further thoughts.

@headtr1ck
Copy link
Collaborator

Felt the need for this one again. I'm going to self-merge next week unless someone has any further thoughts.

Let's go

@zachjweiner
Copy link

I tried out this PR, as it implements something similar to (but not quite) what I was looking for. Is this interface meant only to support variable sizes and colors based on a coordinate/variable, like scatter? I noticed that float values for linewidth are not handled/excepted on properly (and linewidths and lw are both ignored). I personally would never use variable-based linewidths - usually for such plots with many lines I just want to set them to be thin. (I would also likely make use of manually setting colors by passing a list thereof.)

My (very frequent) use case here is just that of DataArray.line.plot(hue=...), but adding a colorbar rather than a legend (which I currently do manually). Is it feasible to refactor any of the implementation here to enable something like this? (I also frequently create discrete colorbars, so that they're really just legends but in colorbar form. The levels argument gets close to this, except they aren't centered on the actual coordinate values.) Can move this to a new issue if this would be better as a separate feature request.

@dcherian
Copy link
Contributor

dcherian commented Jan 2, 2025

my only concern is the one character difference between .line and .lines. Shall we call it line_collection instead? Ideally we would have only one of these but I can see its hard to do that now, given how complex the plotting module is.

@Illviljan
Copy link
Contributor Author

Illviljan commented Jan 7, 2025

I tried out this PR, as it implements something similar to (but not quite) what I was looking for. Is this interface meant only to support variable sizes and colors based on a coordinate/variable, like scatter? I noticed that float values for linewidth are not handled/excepted on properly (and linewidths and lw are both ignored). I personally would never use variable-based linewidths - usually for such plots with many lines I just want to set them to be thin. (I would also likely make use of manually setting colors by passing a list thereof.)

It's intended to be similar to scatter, yes. I've found the same arguments for using size on scatterplots can be made for lineplots.
I'll look into linewidth, linewidths and lw, during this week. Does scatter have similar issues?
But don't expect this to work exactly like plt.plot and xr.plot.line, they simply have too many options to replicate with LineCollection (I've tried, it got very messy).

My (very frequent) use case here is just that of DataArray.line.plot(hue=...), but adding a colorbar rather than a legend (which I currently do manually). Is it feasible to refactor any of the implementation here to enable something like this? (I also frequently create discrete colorbars, so that they're really just legends but in colorbar form. The levels argument gets close to this, except they aren't centered on the actual coordinate values.) Can move this to a new issue if this would be better as a separate feature request.

My idea for quick discrete colorbars is to simply convert the hue variable to string.

@Illviljan
Copy link
Contributor Author

my only concern is the one character difference between .line and .lines. Shall we call it line_collection instead? Ideally we would have only one of these but I can see its hard to do that now, given how complex the plotting module is.

line_collection implies a more basic version in my mind:

  • PathCollection (Very low-level, e.g. no datetime handling) -> plt.scatter (Handles datetime etc.)-> xr.plot.scatter
  • LineCollection (Very low-level, e.g. no datetime handling) -> _line (Handles datetime etc. Pretty much a copy of plt.scatter) -> xr.plot.lines

Seaborns new object-based plotting uses lines as well with similar argument, so I don't feel too worried anymore:
https://seaborn.pydata.org/generated/seaborn.objects.Line.html
https://seaborn.pydata.org/generated/seaborn.objects.Lines.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants