Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: inlineImages not working #1218

Open
1 task done
econte-sprig opened this issue May 2, 2023 · 13 comments
Open
1 task done

[Bug]: inlineImages not working #1218

econte-sprig opened this issue May 2, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@econte-sprig
Copy link

econte-sprig commented May 2, 2023

Preflight Checklist

  • I have searched the issue tracker for a bug report that matches the one I want to file, without success.

What package is this bug report for?

rrweb

Version

2.0.0-alpha.6

Expected Behavior

I expect the record function to emit events with the images as base64 rr_dataURL attributes.

Actual Behavior

Events are emitted with the normal img src attribute unchanged and no rr_dataURL attribute. No warnings or errors in the console.

Steps to Reproduce

Using Chrome Version 112.0.5615.137 (Official Build) (arm64)
Begin recording with the setting inlineImages: true, and checkoutEveryNms of some value (say 10000).
Look at the events being emitted, you will notice that they do not have the rr_dataURL attribute

Testcase Gist URL

No response

Additional Information

You will notice in the Chrome network tab that every 10 seconds, the browser refetches the images on the page. See the following screenshot that traces the image fetch to rrweb-snapshot.
image

That line is the following:
image

Setting the crossOrigin attribute on the img element is causing it to re-load the image. Which then leads to a bug. The condition on line 676 if (image_1.complete && image_1.naturalWidth !== 0) is always false, because setting crossOrigin causes the image to be reloaded.

Furthermore, I don't understand the point of setting the onload event handler. The function serializeElementNode is not asynchronous, by the time the handler is called, the function has already returned the attributes object.

@econte-sprig econte-sprig added the bug Something isn't working label May 2, 2023
@econte-sprig
Copy link
Author

econte-sprig commented May 2, 2023

Building with a locally modified version of rrweb I have tried deleting the following lines:

  1. image_1.crossOrigin = 'anonymous';
  2. oldValue_1 ? (attributes.crossOrigin = oldValue_1) : image_1.removeAttribute('crossorigin');
  3. else image_1.onload = recordInlineImage;

And then inline images works as expected, generating outputs with the rr_dataURL attribute as expected

This also stops the periodic refetching of every image on the page.

econte-sprig added a commit to UserLeap/rrweb that referenced this issue May 2, 2023
@wfk007
Copy link
Contributor

wfk007 commented May 8, 2023

@econte-sprig Thanks for your PR and detailed explanation, but your change is not working with the following case, and it explains why we need to set the onload event handler.

<html>
    <body>
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/rrweb.min.js"></script>
        <script>
            rrwebRecord({
                inlineImages: true,
                emit(event) {
                    //
                },
            });
            setTimeout(() => {
                const img = document.createElement('img');
                img.setAttribute('crossorigin', 'anonymous');
                img.setAttribute(
                    'src',
                    'xxx',
                );
                document.body.append(img);
            }, 1000);
        </script>
    </body>
</html>

Additionally crossorigin is necessary for recording inline images, as the image base64 string is transformed by canvas.toDataURL, which will throw an error when the domain is not clean.

If the image is without crossorigin attribute initially, after adding the crossorigin anonymous attribute to the image, the browser will force the use of CORS to obtain resources, and you will see a new network request.

If you want to read rr_dataURL attributes from record emit:

  1. If your image is loaded before the record start: you can read it from the full snapshot(with type: 2)
  2. If your image is loaded after the record start: you can read it from the increment snapshot(with type : 3 and source: 0)

@econte-sprig
Copy link
Author

@econte-sprig Thanks for your PR and detailed explanation, but your change is not working with the following case, and it explains why we need to set the onload event handler.

<html>
    <body>
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/rrweb.min.js"></script>
        <script>
            rrwebRecord({
                inlineImages: true,
                emit(event) {
                    //
                },
            });
            setTimeout(() => {
                const img = document.createElement('img');
                img.setAttribute('crossorigin', 'anonymous');
                img.setAttribute(
                    'src',
                    'xxx',
                );
                document.body.append(img);
            }, 1000);
        </script>
    </body>
</html>

Additionally crossorigin is necessary for recording inline images, as the image base64 string is transformed by canvas.toDataURL, which will throw an error when the domain is not clean.

If the image is without crossorigin attribute initially, after adding the crossorigin anonymous attribute to the image, the browser will force the use of CORS to obtain resources, and you will see a new network request.

If you want to read rr_dataURL attributes from record emit:

  1. If your image is loaded before the record start: you can read it from the full snapshot(with type: 2)
  2. If your image is loaded after the record start: you can read it from the increment snapshot(with type : 3 and source: 0)

But the code just fundamentally isn't working. Setting 'crossorigin' on Chrome causes it to always reload and never set rr_dataURL. Are you not seeing this behavior?

@wfk007
Copy link
Contributor

wfk007 commented May 9, 2023

@econte-sprig Thanks for your PR and detailed explanation, but your change is not working with the following case, and it explains why we need to set the onload event handler.

<html>
    <body>
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/rrweb.min.js"></script>
        <script>
            rrwebRecord({
                inlineImages: true,
                emit(event) {
                    //
                },
            });
            setTimeout(() => {
                const img = document.createElement('img');
                img.setAttribute('crossorigin', 'anonymous');
                img.setAttribute(
                    'src',
                    'xxx',
                );
                document.body.append(img);
            }, 1000);
        </script>
    </body>
</html>

Additionally crossorigin is necessary for recording inline images, as the image base64 string is transformed by canvas.toDataURL, which will throw an error when the domain is not clean.
If the image is without crossorigin attribute initially, after adding the crossorigin anonymous attribute to the image, the browser will force the use of CORS to obtain resources, and you will see a new network request.
If you want to read rr_dataURL attributes from record emit:

  1. If your image is loaded before the record start: you can read it from the full snapshot(with type: 2)
  2. If your image is loaded after the record start: you can read it from the increment snapshot(with type : 3 and source: 0)

But the code just fundamentally isn't working. Setting 'crossorigin' on Chrome causes it to always reload and never set rr_dataURL. Are you not seeing this behavior?

can you provide me with a demo to show your case?

@econte-sprig
Copy link
Author

Sure:
https://github.com/econte-sprig/rrweb-demo1

  1. Clone the repo
  2. Run npm i
  3. Run npm run start
  4. Press the 'Record' button to start recording with rrweb
  5. You will see in the network tab that every 10 seconds the images on the page are reloaded
  6. When you press 'Stop' you will see the JSON stringified event array logged to the console, as well as a line 'Includes Data URL attribute true/false'. That line is determined by just checking if the JSON string includes the substring 'rr_dataURL'.

At least on my Chrome 112.0.5615.137 (Official Build) (arm64) you will see that it logs false, the dataURL attribute is not present in the JSON despite setting it to inline images.

econte-sprig added a commit to UserLeap/rrweb that referenced this issue May 9, 2023
@wfk007
Copy link
Contributor

wfk007 commented May 9, 2023

@econte-sprig Hi, thanks for your demo. It should work with the following change. And rr_dataURL attribute is in the full snapshot with a lazy way attached to the attribute object.

const record = () => {
    stopRecord.current = rrweb.record({
      checkoutEveryNms: 10000,
      inlineImages: true,
      emit: (ev) => {
-        events.current.push(JSON.stringify(ev));
+        events.current.push(ev);
      },
    })
  };

@wfk007
Copy link
Contributor

wfk007 commented May 9, 2023

The reload of the image is caused by the cross-origin attribute add and remove, and this attribute is necessary when transforming the image with Canvas. toDataURL will throw an error when the canvas's bitmap is not origin clean

@econte-sprig
Copy link
Author

econte-sprig commented May 9, 2023

@econte-sprig Hi, thanks for your demo. It should work with the following change. And rr_dataURL attribute is in the full snapshot with a lazy way attached to the attribute object.

const record = () => {
    stopRecord.current = rrweb.record({
      checkoutEveryNms: 10000,
      inlineImages: true,
      emit: (ev) => {
-        events.current.push(JSON.stringify(ev));
+        events.current.push(ev);
      },
    })
  };

This isn't a viable fix. RRweb is changing the attributes on the event after it is emitted. Rrweb's own pack function stringifies the events on emit and so this breaks with rrweb's provided pack function. And what if we have already sent the event to the backend by the time the event is changed? Clearly rrweb should not be changing events after emitting them.

@econte-sprig
Copy link
Author

The reload of the image is caused by the cross-origin attribute add and remove, and this attribute is necessary when transforming the image with Canvas. toDataURL will throw an error when the canvas's bitmap is not origin clean

The real fix is to emit a separate attribute mutation event when the reload finishes, then store that so it doesn't reload every X seconds.The event should not be changing after emitting, so when the reload finishes after setting crossOrigin='anonymous' a new event needs to be emitted rather than change an event that was previously emitted.

@econte-sprig
Copy link
Author

econte-sprig commented May 9, 2023

Otherwise, how do I know when an event is finished changing and it is safe to do something with the event (such as store it or send it to the backend)? The event gives no indication that changes are coming, so how long do I have to wait before sending the event to a backend? Am I supposed to just hold every event in memory for X seconds before sending it and just cross my fingers and hope it is done changing despite having no way to know?

This is all not to mention the complete incompatibility with the pack function.

@wfk007
Copy link
Contributor

wfk007 commented May 10, 2023

@econte-sprig Is the following change working for you?

- <img src={logo} className="App-logo" alt="logo" />
+ <img src={logo} className="App-logo" alt="logo" crossOrigin='anonymous' />

Or change the source code of the snapshot with the following change to force a new attribute mutation?
But I don't think it is a good choice to add too many strings into the <img> tag on the user side.

try {
  canvasService!.width = image.naturalWidth;
  canvasService!.height = image.naturalHeight;
  canvasCtx!.drawImage(image, 0, 0);
-  attributes.rr_dataURL = canvasService!.toDataURL(
-   dataURLOptions.type,
-    dataURLOptions.quality,
-  );
+  image.setAttribute('rr_data_url', canvasService!.toDataURL(
+   dataURLOptions.type,
+   dataURLOptions.quality,
+  ))
} catch (err) {
  console.warn(
    `Cannot inline img src=${image.currentSrc}! Error: ${err as string}`,
  );
}

@Juice10
Copy link
Contributor

Juice10 commented Jul 7, 2023

This isn't a viable fix. RRweb is changing the attributes on the event after it is emitted. Rrweb's own pack function stringifies the events on emit and so this breaks with rrweb's provided pack function. And what if we have already sent the event to the backend by the time the event is changed? Clearly rrweb should not be changing events after emitting them.

You are absolutely right, this PR should help with this: #1239

@econte-sprig
Copy link
Author

Just putting together a fix PR for this issue with inline images. You can check it out here.

Let me know if you need anything, such as a more detailed repro or something, but the PR is not too involved. I can explain any of the changes if you have any questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants