Fix: PDF Scaling Bug #1382

thomas-reimonn · 2025-01-03T15:54:06Z

Datashader currently has a known bug where, upon saving figures to PDF, the rasterized portion of the Datashader output is scaled differently from the vector-based elements (such as axes, labels, etc.). This PR fixes the issue and demonstrates it by ensuring the Datashader images render correctly within a single PDF figure.

nvictus · 2025-01-03T19:01:22Z

This fixes the dsshow issue #1082

amaloney · 2025-01-03T23:02:51Z

There are two problems discussed in the linking issue #1082.

Data being transformed without explicit user settings
Color variations due to the type of image being saved (bitmap vs vector)

1. Data being transformed without explicit user settings

I was able to reproduce the unexpected data scaling based on the type of image being saved. The environment and modified MRE are shown below in the code section. To get the expected behavior, uncomment out the code below the note.

Unexpected data scaling (no explicit plot dimensions given for vector images)

Expected behavior (explicit plot dimensions given for vector images)

datashader is applying an affine transformation when it outputs to a pdf or svg (vector image). It does not apply a transformation when a png (bitmap) is the final output. The transformation behavior is left to the end user to suss out on their own based on the image being drawn, which I think is bad user experience. The docstring for the make_image method discusses an affine transformation being applied if the kwarg unsampled is True.

datashader/datashader/mpl_ext.py

Lines 255 to 272 in 6802220

    
               def make_image(self, renderer, magnification=1.0, unsampled=True): 
        
                   """ 
        
                   Normalize, rescale, and colormap this image's data for rendering using 
        
                   *renderer*, with the given *magnification*. 
        
                   If *unsampled* is True, the image will not be scaled, but an 
        
                   appropriate affine transformation will be returned instead. 
        
                   Returns 
        
                   ------- 
        
                   image : (M, N, 4) uint8 array 
        
                       The RGBA image, resampled unless *unsampled* is True. 
        
                   x, y : float 
        
                       The upper left corner where the image should be drawn, in pixel 
        
                       space. 
        
                   trans : Affine2D 
        
                       The affine transformation from image to pixel space. 
        
                   """

datashader will switch this value and apply a transformation if the final image size is not explicitly given, and the user is saving as a vector image.

datashader/datashader/mpl_ext.py

Lines 282 to 288 in 6802220

    
           if ( 
        
               self.plot_width is not None 
        
               or self.plot_height is not None 
        
               or self.width_scale != 1.0 
        
               or self.height_scale != 1.0 
        
           ): 
        
               unsampled = False

I need to investigate further, but I think the call to make_image is happening when a user saves the image to disk. Unfortunately when the user is saving to a vector image, the resulting image size is unbounded and the unsampled kwarg is being switched, thus a transformation on the data occurs.

2. Color variations due to the type of image being saved (bitmap vs vector)

Pixel color values are computed based on a combination of

type of glyph used;
the glyph geometry;
the glyph size;
the space available for the final image; and
the overlap of glyphs within the final image's area.

You can see this behavior in the following movie where I modify the final image's total area by making it progressively smaller using the supplied notebook, but not using the given PR changes. This behavior is expected, but may not be documented well enough.

dpi-changes.webm

@nvictus and @thomas-reimonn the PR modifies default datashader behavior, and I am still fairly new to the codebase so I do not think I can approve it without more discussions with @jbednar.

Code

Install the environment and run the MRE.

mamba env create environment.yaml && mamba activate datashader-issue1082
python example.py

# environment.yaml
name: datashader-issue1082
channels:
    - conda-forge
dependencies:
    - python ==3.8.12

    # Package managers
    - pip

    # Dependencies
    - datashader ==0.14.0
    - matplotlib ==3.2.2
    - numpy ==1.21.6
    - pandas ==1.1.3

# example.py
import datashader as ds
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datashader.mpl_ext import dsshow

EXTENSIONS = ["pdf", "png", "svg"]
SIZE = 100_000
DPIS = ["300", "default"]


def main():
    # Fake data for testing
    x = np.random.normal(size=SIZE)
    y = x * 3 + np.random.normal(size=SIZE)
    df = pd.DataFrame(columns=["xs", "ys"], data=np.array([x, y]).T)

    for dpi in DPIS:
        for extension in EXTENSIONS:
            fig, ax = plt.subplots()
            dsartist = dsshow(
                df=df,
                glyph=ds.Point("xs", "ys"),
                aggregator=ds.count(),
                vmin=0,
                vmax=100,
                norm="linear",
                aspect="auto",
                ax=ax,
                # NOTE: Uncomment without the PR changes to get similarly scaled images.
                # plot_width=1920,
                # plot_height=1440,
            )
            plt.title(f"{dpi} dpi {extension}")
            if dpi == "300":
                fig.savefig(f"test_{dpi}_dpi.{extension}", dpi=300)
            else:
                fig.savefig(f"test_{dpi}_dpi.{extension}")
            plt.close()


if __name__ == "__main__":
    main()

nvictus · 2025-01-03T23:20:21Z

@amaloney not sure yet if this addresses your concerns, but here is an explanation of the change:

The DSArtist.make_image is a matplotlib artist method that is called during rendering. If you look at what matplotlib does, it depends on the rendering backend:

If a renderer supports affine transforms (e.g. in a notebook), matplotlib passes unsampled=True to make_image and passes an affine transform trans to draw_image to scale the image.
If a renderer does NOT support affine transforms (like PDF), matplotlib calls make_image without passing unsampled (assuming it is False), and then calls draw_image without passing in an affine transform but instead passes a magnification (default is 1.0) to make_image. In this case, the underlying _make_image implementation rescales the image according to the given magnification by resampling.

Notably, matplotlib has the expectation that the default value of unsampled is False. It was my mistake to set the default value to True in the original implementation of the DSArtist (I was the main implementer), as I wanted to avoid additional manipulation by matplotlib. This didn't matter in scenario 1, so the scaling bug only surfaces when using a matplotlib renderer that does not support affine transformation.

amaloney

Overall, this LGTM. I don't think the datashader-image.pdf file, nor the notebook are necessary since they are both there as examples showing the bug and the fix.

thomas-reimonn · 2025-01-06T22:55:33Z

Thank you for the feedback on this! I removed the demo notebook and image.

codecov · 2025-01-07T15:07:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.42%. Comparing base (6802220) to head (e3136f3).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1382   +/-   ##
=======================================
  Coverage   88.42%   88.42%           
=======================================
  Files          93       93           
  Lines       18707    18705    -2     
=======================================
- Hits        16541    16540    -1     
+ Misses       2166     2165    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jbednar · 2025-01-07T20:04:33Z

Thanks for catching that!

thomas-reimonn added 2 commits January 3, 2025 10:23

fix: mpl_ext::make_image default unsampled=False to fix PDF scaling bug

3304061

demo fix

6e54085

amaloney requested changes Jan 6, 2025

View reviewed changes

remove bug demo notebook & image

e3136f3

amaloney approved these changes Jan 7, 2025

View reviewed changes

jbednar merged commit a128e67 into holoviz:main Jan 7, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: PDF Scaling Bug #1382

Fix: PDF Scaling Bug #1382

thomas-reimonn commented Jan 3, 2025

nvictus commented Jan 3, 2025

amaloney commented Jan 3, 2025

nvictus commented Jan 3, 2025 •

edited

Loading

amaloney left a comment

thomas-reimonn commented Jan 6, 2025

codecov bot commented Jan 7, 2025 •

edited

Loading

jbednar commented Jan 7, 2025

Fix: PDF Scaling Bug #1382

Fix: PDF Scaling Bug #1382

Conversation

thomas-reimonn commented Jan 3, 2025

nvictus commented Jan 3, 2025

amaloney commented Jan 3, 2025

1. Data being transformed without explicit user settings

Unexpected data scaling (no explicit plot dimensions given for vector images)

Expected behavior (explicit plot dimensions given for vector images)

2. Color variations due to the type of image being saved (bitmap vs vector)

Code

nvictus commented Jan 3, 2025 • edited Loading

amaloney left a comment

Choose a reason for hiding this comment

thomas-reimonn commented Jan 6, 2025

codecov bot commented Jan 7, 2025 • edited Loading

Codecov Report

jbednar commented Jan 7, 2025

nvictus commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 7, 2025 •

edited

Loading