-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] tool.setuptools.license-files
results in invalid metadata
#4759
Comments
This is a well-known case of an early implementation of a PEP 639 draft. Right now tools are equipped to accept this variation. In time it will be fixed. But not immediately, due to effort constraints and release scheduling (we are in the process of implementing previous versions of metadata first). Probably we can close this as a "kind of duplicate" of the request to implement PEP 639. |
What I am trying to say is that tools are not equipped to accept this variation. |
@dnicolodi Is right here. The What I'm uncertain about is the solution. Sure, we can remove the |
I was thinking that twine/PyPI/pip have been accepting Just to emphasize that we are going to tackle this problem in time (btw, thanks @cdce8p for the PRs). But before that we will implement metadata version 2.3 (I just got a review recently on one of the PRs necessary for 2.3, but this week I don't have the time to delve into it). Meanwhile, my strong preference is to not change anything that has been in place for the last 2 or 3 years. If you need the process core metadata using |
I am not very familiar with all the consumers of package metadata. I have the impression that there are several tools using different approaches for metadata parsing and validation. Only relatively recently Indeed, I bumped into the issue with I was under the impression that PyPI implements validation of the distribution files metadata, but if it does, the validation is not very strict. What is validated strictly is the form data that is sent alongside the distribution files. That can be tweaked as needed to support the metadata emitted by existing setuptools releases. pip is most likely the most permissive consumer of metadata, thus I don't expect it to do any validation. I don't see any reason to change setuptools to fix this issue straight away. I filed this issue to make sure that you are aware of it. Because, as I wrote above, metadata parsing has not been very strict so far, it could have been that you are not aware of it. |
Thank you very much @dnicolodi. One thing that would help a lot is if |
meson-python and scikit-core (and possibly other build backends) use pytproject-metadata https://github.com/pypa/pyproject-metadata/ for translating There is some work toward incorporating pyproject-metadata into packaging pypa/packaging#846 pypa/packaging#847 On the other hand, I like that there are more that one implementation of the standard: that makes it easier to ensure that the only implementation does not diverge from the standard and avoids such implementation, bugs included, to become the de-facto standard. We were in that situation before PEP 517 and PEP 621, and we are still cleaning up the mess... 🙂 |
PyPI uses Saw the commit from @dnicolodi on the packaging PR: dnicolodi/twine@ab3bf7d. All considered, that's probably the most practical solution here.
Sorry in advance if I'm a bit annoying here. I can understand your position, but wouldn't fully agree with it. We can do more even before we implement metadata version
Why am I pushing for these changes? For Home Assistant we use a script to try to validate the licenses of all requirements and tbh it's just a mess. Some packages use the outdated classifiers, some custom license strings and others the full license in the metadata. I'm prepared to open PRs for some of these dependencies but it only really makes sense if I can use the final |
AFAIU However, |
Can you point me to more information regarding this? I was under the impression that |
You're right. I only checked the packaging call
For
AFAIK you can overwrite the The last comment on discuss I saw was
|
Hi guys, I understand the urgency of the topic, but I am also very conscious that we need to move very carefully to avoid breaking the ecosystem all at once. In the past we had a lot of bad experiences with botched releases, so I am trying to take the most conservative approach possible (and even with that it is possible there will be problems). The good thing is that we have most of the pieces already in place (thanks again for the PRs). We now need to coordinate to release them, collect feedback and fix if things are broken. My preference is to do things step by step and wait one week or so between steps to receive feedback of early adopters on edge cases (of course respecting the holiday season coming ahead and the times of all collaborations, so probably longer than that). I suggest the following:
I think that step 2 is going to be trivial to support, once we concede that we don't have to worry to much about the pypa/packaging#845 and treat it as a temporary bug. Footnotes
|
One last post from me and then I'll shut up and respect your decision :)
Absolutely! The only thing our opinions differ I believe is in the risk these PRs actually pose. What I'm trying to say is it's quite small and we can safely do it now. Let me explain
Doing these now would provide valuable feedback way before we'd consider moving to |
I don't want to influence the pace of introduction of new features in any way (egoistically I would like support for metadata 2.4 to land as soon as possible to be able to use it in my projects, but it would only be something nice to have) so this should not be read as supporting one position or the other, but I would like to point out that pyproject-metadata and thus meson-python have taken the approach of emitting metadata where the metadata version is set to what is required to represent faithfully the user input, namely the content of meson-python does not support any dynamic metadata fields, thus for it this means either metadata version 2.2 or 2.4. The latter is used only when I think this is the only way to proceed as pre PEP 639 and post PEP 639 license declaration formats are not compatible with each other, thus emitting metadata version 2.4 with a These PRs have more details pypa/pyproject-metadata#132 pypa/pyproject-metadata#206 |
I'm proposing the reverse: Emitting the current core metadata That approach works fine for |
I don't understand what the advantage of doing this is. IIUC, your goal is to move as fast as possible to have packages with PEP 639 metadata. However, with this approach, having packages with PEP 639 license metadata will require two releases of the packages involved: one that updated the metadata fields in The cost of doing this is not only the two releases, but also that between the two releases the involved packages will not have clear license information displayed on PyPI. IIRC, PEP 639 forbids having classifiers indicating the package license while using the |
Correct.
Updating packages always takes time. What can be optimized though is developer time. I can either fix / convert all wrong licenses to SPDX expressions now and do a second pass to convert To give an example, Just for Home Assistant, I currently track over 650 packages with inaccurate license data, no SPDX expression in either the
Not entirely. PyPI can handle all case. This is with both classifier and (old) license metadata 1 and here an example with only the (old) license metadata, but still a valid SPDX expression. 2
No, build tools MAY raise an error if a license classifier is present.3 PyPI must only reject uploads with both Footnotes |
That seems to be the cleanest approach to pypa/setuptools#4759
If we need to be really conservative about a new metadata version, can we revert whatever it is that is putting License-File into 2.1 metadata, since PyPI hard rejects that? Currently this metadata is hard broken so can't really break it further ;-) |
My user story:
|
This is correct, and it is correclty handled by The error you encounter is cause by the default value of the [tool.setuptools]
license-files = [] to the I speculate that you are suing |
Thanks for explaining that! Yeah, I ended up just using twine (before I saw your post and konstin's post here explaining that PyPI validates formdata, not the uploaded METADATA)... so I guess I just uploaded a wheel with invalid METADATA that will fail |
) * Remove "content" from set of specially handled metadata fields The "content" field is always added to the form data after the package metadata has been flattened, thus it is not needed to handle it in the flattening method. Remove the associated test. This will allow to tighten typing in a successive commit. * Remove "attestations" from the set of specially handled metadata fields The "attestations" field is a string: strings do not need flattening. * Refactor code a tiny bit Avoid looking a key up into a set of one element and remove an indirection through a module global variable. This will make it a bit easier to extend the flattening logic in successive commits. * Switch from pkginfo to packaging for parsing distribution metadata The packaging package is maintained by the PyPA and it is the de-facto reference implementation for the packaging standards. Using packaging for parsing metadata guarantees support for the latest metadata versions. warehouse, the Python package index implementation used by PyPI, also uses packaging for parsing metadata. This guarantees that metadata parsing is the same on the client and server side, for the most prominent index. * Enable some more mypy checks * Move monkeypatching of metadata 2.0 support to a more proper place It was done in the support code for the wheel file format but it affects metadata loading from all supported distribution types. Move it to generic code. * Accommodate for invalid metadata produced by setuptools See pypa/setuptools#4759.
Set license-files to an empty array to work around issues releasing to PyPI. See: - astral-sh/uv#9513 - pypa/setuptools#4759
Set license-files to an empty array to work around issues releasing to PyPI. See: - astral-sh/uv#9513 - pypa/setuptools#4759
Yeah, there is a growing pain that we discovered in the community once PEP 639 was finally accepted... Probably because of the very long time the community took to finalise it, and the way the core metadata version follows a strictly monotonic model. An earlier version of Since all the tools available were lenient in validating that field and happily accepted it, it kind of became a "de facto" standard, with widespread usage. I think it would be an error to start strictly validating old versions of metadata regarding In my opinion, the best way forward is to be backward compatible with the existing tools behaviour, and be lenient when validating It should be not too difficult to implement as one can simply delete the The implementation plan for PEP 639 in setuptools, was previously mentioned in #4759 (comment). It is:
I believe that keeping backwards compatibility is fundamental in the ecosystem because:
|
The strictly monotonic approach could be revisited. It certainly has its downsides. But as usual, someone would need to put together a proposal, write a PEP, and push it through to approval. I'm not sure if there's the community bandwidth for that - apart from this situation, there haven't been that many problems with the current scheme.
I wasn't involved in that decision, either, but I believe it was ill-advised. Even if the situation with license information was a problem, adding fields not defined in the spec is not allowed. The core metadata spec says that it should be considered "complete" - which I view as meaning "only these fields are allowed". I'd be happy to have the wording clarified if people don't think it's sufficiently obvious that this is the case. But debating the history isn't productive. Setuptools produced bad data in the past, and this is something we need to deal with now. One thing I'm not clear on - is setuptools still producing bad metadata at the moment, and if so, do you intend to fix that before starting on the plan to migrate to metadata 2.4? Because I'd be less inclined to support loosening the validation if it means that setuptools will continue to publish bad data...
I think this is a bit strong. I think we probably have to allow invalid license-files metadata, just because not doing so would cause too much disruption. But I don't think it's an "error" to do so. It's not possible to change every place where tools validate metadata. Nor is it possible to do anything about tools that ignore license-file if the metadata version is below 2.4. The best we can hope for is to ask commonly-used validation libraries ( There's no good answer here, unfortunately.
Absolutely. But part of the approach to backward compatibility is to prohibit arbitrary fields being added. Unfortunately bugs happen, and I agree that we have to handle them, but we shouldn't make life impossible for ourselves by assuming that we can't rely on the rules we set. I'm not 100% sure what the actual next step is here. Medium term, setuptools will move to metadata 2.4. But in the short term, I see various possible actions:
Did I miss any? |
You are correct, apologies for the bad choice of words.
Yesterday I released This preference is motivated because
Currently, I believe PyPI, |
OK. It may be worth checking back on this if there are unforeseen delays.
I'm not a specialist either. My concern was that tools could ignore License-File if the metadata version is < 2.4, resulting in silent errors. But we simply don't know.
The OP's issue was with packaging. Maybe the answer is simply to point out that If we do need a spec change, I'd propose that it could be something like the following:
This would need to be brought up for discussion on Discourse. As it affects software behaviour, it may require a PEP (although if the community agrees, it could be approved as a text-only change). |
I agree. I want to bump the version step by step and give a couple of days between them so that we can receive feedback, but we can revisit this.
I agree, the design in import warnings
from packaging.metadata import Metadata, parse_email
example = """\
Metadata-Version: 2.1
Name: hello-world
Version: 0.42
License-File: MIT.txt
License-File: BSD.txt
"""
raw, unparsed = parse_email(example)
if unparsed:
raise ValueError(f"Invalid metadata fields: {unparsed!r}")
if raw.get("metadata_version", "0") < "2.4" and raw.pop("license_files", None):
warnings.warn("License-File is not supported for Metadata-Version < 2.4")
print(Metadata.from_raw(raw))
I would like to point out that this is not an exclusivity of the current and old versions of setuptools. If we consider the whole spectrum of already existing and published wheels, old versions of other build tools also present a similar behaviour, for example: https://inspector.pypi.io/project/hatch/1.10.0/packages/24/cc/d4ff74c07e7aa12525aabe96dcb3e78068483f17423ef610894808aca9b0/hatch-1.10.0-py3-none-any.whl/hatch-1.10.0.dist-info/METADATA#line.12 |
Yeah, technically you should convert
Hmm, has anyone flagged this up to hatch? I assumed it was purely a setuptools issue - my apologies. I do want to be clear it's a bug, though, so would you be OK with "Due to a bug in some build backends..."? If you're amenable, I can see advantages to adding "(including setuptools)" as the number of packages that use setuptools is what makes this such a significant problem, but I'd understand if you didn't want to see setuptools called out explicitly like that. |
In terms of actionable items, Setuptools is lagging behind in this aspect because it was complicated to implement version 2.2 of the metadata spec. |
I originally added If all had gone to plan, the PEP would have been finished shortly there after and the implementation finalized so nobody would have cared but alas.
Yes, validation the current metadata with packaging would fail. That's unlikely to be an actual issue for end users though. It has been this way likely since the change was added here and there weren't any reports. The exception here is the latest twine change. As they rewrote which fields they submit to PyPI (basically all, instead of only a whitelist) this was any issue. Therefore a workaround was added, basically dropping the Maybe
This sounds reasonable to me.
The current We could simply remove the field, although as tools already need to handle it, I don't see much reason for it. A better approach IMO would be to move forward towards full PEP 639 support. A first step could be #4728 which would at least make the |
setuptools version
setuptools==74.1.2
Python version
Python 3.13
OS
any
Additional environment information
No response
Description
If any of the the glob patterns specified in
tool.setuptools.license-files
matches a file in the package, setuptools generates invalid metadata: it includes aLicense-File
field while specifyingMetadata-Version
to be2.1
. This is invalid andpackaging
raises an exception while parsing the metadata. This likely results in the resulting distributions to not be accepted by PyPI.Because
tool.setuptools.license-files
has a default value of['LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*']
the problem can be encountered also in packages that do not explicitly set this field inpyproject.toml
but happen to have a file matching the default glob pattern.Expected behavior
Do not emit the
License-File
field or do it and specifyMetadata-Version: 2.4
as per PEP 639.How to Reproduce
Here is a short reproducer:
Output
The text was updated successfully, but these errors were encountered: