Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG XL decoding #633

Closed
1 task done
veluca93 opened this issue May 12, 2021 · 30 comments
Closed
1 task done

JPEG XL decoding #633

veluca93 opened this issue May 12, 2021 · 30 comments
Assignees
Labels
Resolution: satisfied with concerns The TAG is satisfied with this work overall but requires changes Topic: media

Comments

@veluca93
Copy link

Hi TAG!

I'm requesting a TAG review of JPEG XL decoding, a new image format.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: N/A
  • The group where the work on this specification is currently being done: JPEG
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): JPEG

You should also know that it was noted in #495 that "[...] a Tag Review wouldn't be applicable for an image decoder implementation.". As there seems to be some disagreement on this point, we're opening this issue for further discussion.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify @veluca93 and @jonsneyers

@veluca93
Copy link
Author

Possibly also relevant: annevk/orb#3

@cynthia
Copy link
Member

cynthia commented May 13, 2021

The spec costs 118 CHF to view, we don't really have budget for that.

@plinss
Copy link
Member

plinss commented May 13, 2021

Reviewing ISO standards is also somewhat out of scope for the TAG

@plinss plinss closed this as completed May 13, 2021
@veluca93
Copy link
Author

To be clear, the specification is here mostly for reference/proof that this format is standardized, we're not asking for a review of the JPEG XL specification.

I guess @hober would know better what exactly should be discussed here.

@veluca93
Copy link
Author

I asked @hober, and the point that we'd want to discuss is the following (taken from an email from @annevk):

[...] This is the first new format with a new signature
that I know of so ideally we use this opportunity to not further
increase the bad security behavior of the past. As written there I
think at minimum we should require a MIME type in the img element
processing model (similar to SVG) and at best we would also require
CORS (similar to module scripts).

Also, a freely-accessible, if somewhat outdated specification link is available at https://arxiv.org/abs/1908.03565.

@jonsneyers
Copy link

One other thing that might be relevant is the question of whether or not https://mimesniff.spec.whatwg.org/#image-type-pattern-matching-algorithm should be extended to including the JPEG XL header (and the AVIF one, for that matter).

@cynthia
Copy link
Member

cynthia commented May 13, 2021

Okay, now I see the intention - luckily we have guidelines for this exact case.

https://w3ctag.github.io/design-principles/#new-data-formats

The part that probably is of your interest is this:

For legacy reasons browsers support MIME type sniffing, but we do not recommend extending the pattern matching algorithm, due to security implications, and instead recommend enforcing strict MIME types for newer formats.

So no, please do not extend mimesniff for newer formats. We understand this may cause some inconveniences, but we are trying to move away from magic.

@cynthia cynthia reopened this May 13, 2021
@veluca93
Copy link
Author

Out of curiosity, what are the security implications of extending MIME type sniffing for an image format? The only information I can find relates to executable resources, and to the best of my knowledge no image format will result in execution of client-controllable code.

@cynthia
Copy link
Member

cynthia commented May 13, 2021

None that I know of. The link is pointing to an inappropriate section of the mimesniff spec due to this principle initially triggered by the AVIF review (I should fix that.) but the main reason for media files is to reduce magical behavior for newer formats.

This mildly conflicts with the priority of constituencies we emphasize (as noted by the hint of "inconvenience") but this is the general consensus we came to.

@jonsneyers
Copy link

jonsneyers commented May 13, 2021

Note that afaik, in the case of JPEG 2000, AVIF, and JPEG XL, current behavior of supporting browsers is to do mime sniffing for those signatures too (so not following the current mimesniff spec but de facto extending it).

Would there be any security-related or other advantages if this would be changed? (that is, if those browsers would actually respect the mimesniff spec and refuse to decode j2k/avif/jxl images if the server does not return the correct mime type)

@veluca93
Copy link
Author

My opinion on this is that it's better to favour the "principle of least surprise", and file types that fundamentally do the same thing (such as file types under "image/") ought to be all treated in the same way - I don't see a very good reason to impose additional constraints on an image format for the only reason that it is "new".

Or, to state it in another way - I see making "image/jxl" and "image/avif" behave in the same way as any other "image/*" format to be more in line with the proposed principles than doing otherwise.

@jonsneyers
Copy link

I can see how image/svg+xml might have to receive a different treatment, given that it is an "image format" that can contain arbitrary links and javascript, so it probably poses more security risks than "normal" image formats.

But for j2k, avif and jxl, that does not apply.

@domenic
Copy link
Member

domenic commented May 13, 2021

There are security-related problems with extending sniffing, even for images, due to Spectre. In a Spectre world, it's important that the origin server give an explicit signal (i.e. content-type header) that the resource is an image and thus is OK being read cross-origin. Relying on sniffing to make this crucial security decision runs the risk of inadvertently exposing non-image resources to attackers.

Please follow the guidelines for new formats.

@veluca93
Copy link
Author

Could you elaborate on that?

Also, how is this different from, say, a JPEG decoder?

By the way, why isn't server-side opt-in to disabling cross-site requests for a given resource a better solution to this kind of issues?

@domenic
Copy link
Member

domenic commented May 13, 2021

It is different than JPEG because we can't break all the JPEG-using pages by enforcing better security on them.

A server-side opt-in is not the right solution because opting in to protecting your users is the opposite of how we should be doing security; you should opt in to exposing data, not opt in to protecting data.

@domenic
Copy link
Member

domenic commented May 13, 2021

(To be clear, some of my colleagues are looking into enforcing strict mine type checking even for existing formats like JPEG. So in the future it may in fact be the case that all formats behave this way, not just new ones that follow the guidelines. But last I checked doing so was too likely to break pages, so it'll be some heads before we can move the web to a more secure architecture in this regard. Similar to the still-ongoing transition from non-secure HTTP to HTTPS.)

@veluca93
Copy link
Author

veluca93 commented May 13, 2021

It is different than JPEG because we can't break all the JPEG-using pages by enforcing better security on them.

Sure, but if an user cannot update their webserver to provide the correct mime type for image/jxl and image/avif, then the result will be as insecure as serving image/jpeg, and will not get the benefits of the reduced bandwidth from using a new format - this looks to me like a worse final result.

A server-side opt-in is not the right solution because opting in to protecting your users is the opposite of how we should be doing security; you should opt in to exposing data, not opt in to protecting data.

Still, a solution would then be to encourage web servers to opt in by default, instead of encouraging browsers to "opt in by default". This allows protection of old formats too, and it's not just limited to new formats. Moreover, adding barriers to adoption of new format in name of increased security will just result in the new format not being adopted at all, which doesn't end up improving security.

I cannot find any information of what kind of attacks would be possible due to mime type sniffing on images in a Spectre-based world. Could you elaborate more on this?

@jonsneyers
Copy link

There are security-related problems with extending sniffing, even for images, due to Spectre. In a Spectre world, it's important that the origin server give an explicit signal (i.e. content-type header) that the resource is an image and thus is OK being read cross-origin. Relying on sniffing to make this crucial security decision runs the risk of inadvertently exposing non-image resources to attackers.

Please follow the guidelines for new formats.

The origin markup can specify a content-type in a picture srcset type, but in the case of a simple img tag, the markup does not specify a content type, so it is up to the cross-origin server to provide a response with a correct media type.

I am confused about what exactly is the concern here.

Could you give an example of how a potential attack could look like that is possible when the jxl magic gets added to the mimesniff spec and not possible when it does not get added there?

@domenic
Copy link
Member

domenic commented May 13, 2021

I cannot find any information of what kind of attacks would be possible due to mime type sniffing on images in a Spectre-based world. Could you elaborate more on this?

See https://github.com/annevk/orb/

Could you give an example of how a potential attack could look like that is possible when the jxl magic gets added to the mimesniff spec and not possible when it does not get added there?

Consider a data file which is not JXL, but gets sniffed as it. It has its content type set to something like application/vnd.bank.data, not image/jxl.

By doing <img src="https://bank.example/sensitive-user-data.bin"> on an attacker's site, sensitive-user-data.bin is brought into the attacker's process. They can then use Spectre to read that data.

If instead the browser required that sensitive-user-data.bin had image/jxl before it tried to decode the image, then it would not be brough into the attacker's process.

This is fairly fundamental modern security hygeine, so I'm a bit surprised that you all haven't encountered it before. It's been encoded into the guidelines, which are there for a reason. It's a bit tiring having to educate folks about this; the point of the guidelines is to make it clear how the web is intended to work going forward. I hope you can accept that they were added for good reason, and follow them.

@veluca93
Copy link
Author

I cannot find any information of what kind of attacks would be possible due to mime type sniffing on images in a Spectre-based world. Could you elaborate more on this?

See https://github.com/annevk/orb/

Could you give an example of how a potential attack could look like that is possible when the jxl magic gets added to the mimesniff spec and not possible when it does not get added there?

Consider a data file which is not JXL, but gets sniffed as it. It has its content type set to something like application/vnd.bank.data, not image/jxl.

By doing <img src="https://bank.example/sensitive-user-data.bin"> on an attacker's site, sensitive-user-data.bin is brought into the attacker's process. They can then use Spectre to read that data.

If instead the browser required that sensitive-user-data.bin had image/jxl before it tried to decode the image, then it would not be brough into the attacker's process.

I understand the argument, but in today's world, you can still sniff other mime types, so the start of the data would still end up being brought in the attacker's process (to perform mime sniffing - unless that happens in a separate process, I'm not familiar with the implementation details here), and would end up being brought in the attacker's process if it corresponds to the mime type of another image format.

Most image decoders are able to discard invalid images rather early, so if the concern is about the increased likelyhood of mis-detection, then it's probably not something to be too concerned about (if mime sniffing happens in a separate process, this can be made more secure by performing this initial validation in the separate process - the likelyhood of misdetection can then be made sufficiently small to be likely comparable to errors from cosmic rays).

If the concern is about the attacker being able to control the first few bytes of the private data, then adding image/jxl does not open any new attack avenues.

In both cases, it seems to me that pushing for servers to, by default, include a X-Content-Type-Options = "nosniff" ; would be a better solution to this problem.

To be clear, I understand that the objective is having a safer web for everyone, but I disagree that not having Mime Sniffing for new formats that are not executable is a way to get there (I agree it is a good idea for executable ones).

This is fairly fundamental modern security hygeine, so I'm a bit surprised that you all haven't encountered it before. It's been encoded into the guidelines, which are there for a reason. It's a bit tiring having to educate folks about this; the point of the guidelines is to make it clear how the web is intended to work going forward. I hope you can accept that they were added for good reason, and follow them.

I have personal problems accepting arguments of the form "we have reasons to do this" that don't explain what those reasons are, and I don't believe it is good engineering practice to do so. If having to educate people about the motivations behind a decision is tiring, I suggest documenting these reasons explicitly in a place that is accessible from the decision, and not using an opaque "due to security implications" as a reason for something ;)

@jonsneyers
Copy link

Thanks for the educational example and excuses for my ignorance.

When we define new standards, obviously we avoid defining the file signature in a way that could be ambiguous or overlapping with already-existing standards, exactly to avoid such things. Of course sensitive-user-data.bin could be arbitrary raw data that by coincidence happens to look like a JXL header (or like a JPEG header for that matter). And obviously, the more patterns are added to the mimesniff spec, the more likely it becomes (in principle) that such coincidences happen.

But wouldn't it be a better approach then to check/enforce that the returned media type is either image/* or unknown, but not something known that is not an image? That way also coincidental matches with a jpeg/png/webp/... header could be caught in the example you described, while magic sniffing can still solve the problem of servers not yet knowing about avif/jxl and returning them as unknown content type.

@domenic
Copy link
Member

domenic commented May 13, 2021

I apologize, but I've reached the limits of my ability to continue this conversation.

@jonsneyers
Copy link

Is there anything else besides potential mimesniff extensions where TAG review is relevant?

Note that the signatures of JPEG 2000 and AVIF are not mentioned in the mimesniff spec, but browsers supporting these formats (Safari, Chrome, Firefox if the avif flag is enabled) do appear to sniff these signatures. This implies that currently I am not aware of any major browser that actually strictly applies the mimesniff spec at this point, which might indicate that an update of that spec might be appropriate regardless of image/jxl.

@veluca93
Copy link
Author

I also would like to know if anything other than mimesniff is relevant.

@cynthia
Copy link
Member

cynthia commented May 17, 2021

Aside from mimesniff, the only thing that seems within our scope would be this: https://bugs.chromium.org/p/chromium/issues/detail?id=1178058#c5

In particular, whether or not there has been any advancements on how this problem would be solved in a codec-agnostic way, while ensuring backwards compatibility with existing web content. If there is a proposal, we'd like to hear since that would be a pretty significant change in loading. (although this should probably be in a separate review)

We (at least last time I checked) really would like mimesniff to not be a part of this - and if so, please talk to the spec authors. Forcing the spec to update after shipping without discussing the standardization story is how we ended up with most of our past mistakes.

@veluca93
Copy link
Author

veluca93 commented May 17, 2021

Aside from mimesniff, the only thing that seems within our scope would be this: https://bugs.chromium.org/p/chromium/issues/detail?id=1178058#c5

In particular, whether or not there has been any advancements on how this problem would be solved in a codec-agnostic way, while ensuring backwards compatibility with existing web content. If there is a proposal, we'd like to hear since that would be a pretty significant change in loading. (although this should probably be in a separate review)

I agree it is an interesting question, but I think this can be done in a subsequent review. I think that at least in principle this can be done for JPEG too, but it would be a change at a different level than adding a new image format, I believe.

We (at least last time I checked) really would like mimesniff to not be a part of this - and if so, please talk to the spec authors. Forcing the spec to update after shipping without discussing the standardization story is how we ended up with most of our past mistakes.

As I said multiple times, I don't really agree with this opinion, and I would like to understand more the reasons why you believe that this is the best path forward (for JPEG XL and AVIF both), but perhaps this is not the best forum for such a discussion. To be clear, I don't want to say that this opinion is incorrect, but, as everything, it has tradeoffs, and I don't think I understand the negative side of enabling mimesniff well enough yet.

@annevk
Copy link
Member

annevk commented May 18, 2021

Luca, I completely agree that AVIF should either be covered by the MIME Sniffing standard or require a MIME type and that we should have had that discussion before shipping. I filed AOMediaCodec/av1-avif#149 to get the conversation on that started and be somewhat centralized. If you could get the relevant Googlers involved I will ping the relevant Mozillians.

@jonsneyers
Copy link

Regarding the MIME Sniffing standard: I suspect that besides AVIF, there might be two more image formats missing from the list, where I suspect (though haven't checked) that the relevant browsers in practice do magic sniffing:

  • JPEG 2000, supported in all versions of Safari
  • JPEG XR (aka WDP, HD Photo, and Windows Media Photo), supported in Internet Explorer and in old versions of Edge

I am not sure what the goal of the MIME Sniffing standard is: is it to document exhaustively what kind of bitstreams might be mistaken for images (in some browsers, not necessarily all browsers), or is it supposed to document only the bitstreams that will be mistaken for images in all browsers? (from a security perspective, I suppose the first one makes more sense).

Maybe this discussion is better moved to the MIME Sniffing standard. I just opened an issue there: whatwg/mimesniff#143

@cynthia
Copy link
Member

cynthia commented Aug 17, 2021

I suspect that besides AVIF, there might be two more image formats missing from the list

I believe the two formats mentioned here are not in the spec as there was no buy in from other vendors aside from the single implementations shipping.

Re: the other comments, our call today did not have enough attendees to provide further formal feedback. Apologies for the long delay.

@kenchris kenchris added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: in progress labels Aug 17, 2021
@ylafon
Copy link
Member

ylafon commented Aug 31, 2021

As we noted already our reluctance to add new entries to be mime sniffed (see https://w3ctag.github.io/design-principles/#new-data-formats) and as the discussion moved to whatwg/mimesniff#143 we are closing this issue here.
Please look at the result of this issue for final resolution.

@ylafon ylafon closed this as completed Aug 31, 2021
@ylafon ylafon added Resolution: satisfied with concerns The TAG is satisfied with this work overall but requires changes and removed Progress: propose closing we think it should be closed but are waiting on some feedback or consensus labels Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: satisfied with concerns The TAG is satisfied with this work overall but requires changes Topic: media
Projects
None yet
Development

No branches or pull requests

9 participants