-
-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When Pages are all displayed inside a <Document>, pages after first can have a scaled up scaleX text layer (only found on some documents) #1848
Comments
I too am experiencing a similar issue. Say for a given page, when looking through the spans constituting the text layer, a vast majority of them do not contain a transform and are seemingly positioned correctly (no disjoint overlapping). However, there are some instances where the span will have a transformation applied to it along the x-axis. Something along the lines of As a temporary solution, I am following the advice mentioned here: #332 (comment). However, I found that the transformation is within the nested line spans and not in
And then use the function here
This removes all the transformations and for the most part yields good results. However, as mentioned before, some of lines already had a small transformation applied to them and so when we remove that, the text does not overlap perfectly (though the difference is not nearly as drastic as the worst offenders). This solution seemingly works well enough for all the PDFs that I have tried it with so far but I don't believe it to be a satisfying solution. pdfjs is clearly doing some calculation under the hood to determine this transformations based on font size, screen width, etc. and so just removing it is definitely a work around. See https://github.com/mozilla/pdf.js/blob/300e806efe7e6438e0b37d8eeb1a97d9e5d27daa/src/display/text_layer.js#L419 for how this transformation is calculated. My best guess is that |
I don't believe this is an issue with pdfjs. if I uploaded my pdf to the pdfjs demo I believe there needs to be a fix to react-pdf here |
hoho i find the reason. if (prevFontSize !== fontSize || prevFontFamily !== fontFamily) {
console.log("---------ctx font--------");
console.log("textContent:", div.textContent);
console.log("pageIndex", this);
console.log("prevFontSize:", prevFontSize);
console.log("fontSize:", fontSize);
console.log("this.#scale", this.#scale);
console.log("fontSize * this.#scale:", fontSize * this.#scale);
console.log("-------------------------");
ctx.font = `${fontSize * this.#scale}px ${fontFamily}`;
params.prevFontSize = fontSize;
params.prevFontFamily = fontFamily;
}
// Only measure the width for multi-char text divs, see `appendText`.
const { width } = ctx.measureText(div.textContent);
if (width > 0) {
transform = `scaleX(${(canvasWidth * this.#scale) / width}) ${transform}`;
}
if (
div.textContent ===
"scale x show error text"
) {
console.log("----------measureText------------");
console.log("transform:", transform);
console.log("width:", width);
console.log("ctx.font:", ctx.font);
console.log("fontSize:", fontSize);
console.log("fontFamily:", fontFamily);
console.log("oldPrevFontSize:", oldPrevFontSize);
console.log("oldPrevFontFamily:", oldPrevFontFamily);
console.log("this.#scale:", this.#scale);
console.log("canvasWidth:", canvasWidth);
}
} so the reason is |
I implemented a Mutex approach like this and it solved all of the width issues
Wanted to flag to @wojtekmaj if we can include some fix to this race condition in the react-pdf lib |
Would this also affect getting the wrong scaleX transform on bolded text? I'm seeing that happen on 9.1. the text is scaled way up incorrectly |
any solution on this? |
Manually rendering all of the pages sequentially with a mutex worked for me. I don't think this is a good solution though, so my ultimate solution was to just use pdfjs. this package was not build to show a full length PDF so I wouldn't use it to do that. |
Before you start - checklist
Description
I am using react-pdf and need the text layer for highlighting purposes. I wanted to swap over from the "single page" to "all page" recipe shown here
https://github.com/wojtekmaj/react-pdf/wiki/Recipes
This code seems to work fine on the surface, but I found that for some PDFs, displaying in all-page caused the text layer past the first page to have some spans with very wrong scaleX transforms applied (way bigger than intended). When displayed in single-page format, all of the spans on the offending pages have spans with expected width.
I tested this using versions 9.0 and 9.1 using different workers and the bug always appears
Steps to reproduce
I made a minimal reproducible example below:
As stated above this doesn't happen to every pdf, and while the best examples are on non-public PDFs, I found a public document where you can see this bug on pages 3 and onwards (attached).
apl_23_003.pdf
Expected behavior
I expect the text layer to fit over the text exactly, like it does when displayed like this
Actual behavior
The text on pages after page 1 is displayed with text layer having a larger scaleX transform when displayed like this
Additional information
This bug will not occur on the first page displayed.
For example, if page 4 is the one that gets stretched, and I display page 4 10 times in a row, the first page will be normal and subsequent pages will be stretched.
This bug does not affect all PDFs, Only a few that I have found.
In debugging I noticed that the PDF does use some encoding that is not supported by my VSCode, I can still open the file and it says "this document contains many invisible unicode characters"
This may contribute to some parsing error, but I don't know why that could occur only past the first page
Environment
The text was updated successfully, but these errors were encountered: