Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: PDF Scaling Bug #1382

Merged
merged 3 commits into from
Jan 7, 2025
Merged

Fix: PDF Scaling Bug #1382

merged 3 commits into from
Jan 7, 2025

Conversation

thomas-reimonn
Copy link
Contributor

Datashader currently has a known bug where, upon saving figures to PDF, the rasterized portion of the Datashader output is scaled differently from the vector-based elements (such as axes, labels, etc.). This PR fixes the issue and demonstrates it by ensuring the Datashader images render correctly within a single PDF figure.

@nvictus
Copy link
Contributor

nvictus commented Jan 3, 2025

This fixes the dsshow issue #1082

@amaloney
Copy link
Contributor

amaloney commented Jan 3, 2025

There are two problems discussed in the linking issue #1082.

  1. Data being transformed without explicit user settings
  2. Color variations due to the type of image being saved (bitmap vs vector)

1. Data being transformed without explicit user settings

I was able to reproduce the unexpected data scaling based on the type of image being saved. The environment and modified MRE are shown below in the code section. To get the expected behavior, uncomment out the code below the note.

Unexpected data scaling (no explicit plot dimensions given for vector images)

Expected behavior (explicit plot dimensions given for vector images)

datashader is applying an affine transformation when it outputs to a pdf or svg (vector image). It does not apply a transformation when a png (bitmap) is the final output. The transformation behavior is left to the end user to suss out on their own based on the image being drawn, which I think is bad user experience. The docstring for the make_image method discusses an affine transformation being applied if the kwarg unsampled is True.

def make_image(self, renderer, magnification=1.0, unsampled=True):
"""
Normalize, rescale, and colormap this image's data for rendering using
*renderer*, with the given *magnification*.
If *unsampled* is True, the image will not be scaled, but an
appropriate affine transformation will be returned instead.
Returns
-------
image : (M, N, 4) uint8 array
The RGBA image, resampled unless *unsampled* is True.
x, y : float
The upper left corner where the image should be drawn, in pixel
space.
trans : Affine2D
The affine transformation from image to pixel space.
"""

datashader will switch this value and apply a transformation if the final image size is not explicitly given, and the user is saving as a vector image.

if (
self.plot_width is not None
or self.plot_height is not None
or self.width_scale != 1.0
or self.height_scale != 1.0
):
unsampled = False

I need to investigate further, but I think the call to make_image is happening when a user saves the image to disk. Unfortunately when the user is saving to a vector image, the resulting image size is unbounded and the unsampled kwarg is being switched, thus a transformation on the data occurs.

2. Color variations due to the type of image being saved (bitmap vs vector)

Pixel color values are computed based on a combination of

  1. type of glyph used;
  2. the glyph geometry;
  3. the glyph size;
  4. the space available for the final image; and
  5. the overlap of glyphs within the final image's area.

You can see this behavior in the following movie where I modify the final image's total area by making it progressively smaller using the supplied notebook, but not using the given PR changes. This behavior is expected, but may not be documented well enough.

dpi-changes.webm

@nvictus and @thomas-reimonn the PR modifies default datashader behavior, and I am still fairly new to the codebase so I do not think I can approve it without more discussions with @jbednar.


Code

Install the environment and run the MRE.

mamba env create environment.yaml && mamba activate datashader-issue1082
python example.py
# environment.yaml
name: datashader-issue1082
channels:
    - conda-forge
dependencies:
    - python ==3.8.12

    # Package managers
    - pip

    # Dependencies
    - datashader ==0.14.0
    - matplotlib ==3.2.2
    - numpy ==1.21.6
    - pandas ==1.1.3
# example.py
import datashader as ds
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datashader.mpl_ext import dsshow

EXTENSIONS = ["pdf", "png", "svg"]
SIZE = 100_000
DPIS = ["300", "default"]


def main():
    # Fake data for testing
    x = np.random.normal(size=SIZE)
    y = x * 3 + np.random.normal(size=SIZE)
    df = pd.DataFrame(columns=["xs", "ys"], data=np.array([x, y]).T)

    for dpi in DPIS:
        for extension in EXTENSIONS:
            fig, ax = plt.subplots()
            dsartist = dsshow(
                df=df,
                glyph=ds.Point("xs", "ys"),
                aggregator=ds.count(),
                vmin=0,
                vmax=100,
                norm="linear",
                aspect="auto",
                ax=ax,
                # NOTE: Uncomment without the PR changes to get similarly scaled images.
                # plot_width=1920,
                # plot_height=1440,
            )
            plt.title(f"{dpi} dpi {extension}")
            if dpi == "300":
                fig.savefig(f"test_{dpi}_dpi.{extension}", dpi=300)
            else:
                fig.savefig(f"test_{dpi}_dpi.{extension}")
            plt.close()


if __name__ == "__main__":
    main()

@nvictus
Copy link
Contributor

nvictus commented Jan 3, 2025

@amaloney not sure yet if this addresses your concerns, but here is an explanation of the change:

The DSArtist.make_image is a matplotlib artist method that is called during rendering. If you look at what matplotlib does, it depends on the rendering backend:

  1. If a renderer supports affine transforms (e.g. in a notebook), matplotlib passes unsampled=True to make_image and passes an affine transform trans to draw_image to scale the image.

  2. If a renderer does NOT support affine transforms (like PDF), matplotlib calls make_image without passing unsampled (assuming it is False), and then calls draw_image without passing in an affine transform but instead passes a magnification (default is 1.0) to make_image. In this case, the underlying _make_image implementation rescales the image according to the given magnification by resampling.

Notably, matplotlib has the expectation that the default value of unsampled is False. It was my mistake to set the default value to True in the original implementation of the DSArtist (I was the main implementer), as I wanted to avoid additional manipulation by matplotlib. This didn't matter in scenario 1, so the scaling bug only surfaces when using a matplotlib renderer that does not support affine transformation.

Copy link
Contributor

@amaloney amaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this LGTM. I don't think the datashader-image.pdf file, nor the notebook are necessary since they are both there as examples showing the bug and the fix.

@thomas-reimonn
Copy link
Contributor Author

Thank you for the feedback on this! I removed the demo notebook and image.

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.42%. Comparing base (6802220) to head (e3136f3).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1382   +/-   ##
=======================================
  Coverage   88.42%   88.42%           
=======================================
  Files          93       93           
  Lines       18707    18705    -2     
=======================================
- Hits        16541    16540    -1     
+ Misses       2166     2165    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jbednar jbednar merged commit a128e67 into holoviz:main Jan 7, 2025
12 checks passed
@jbednar
Copy link
Member

jbednar commented Jan 7, 2025

Thanks for catching that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants