-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to packaging for parsing metadata and support metadata 2.4 #1180
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
60ba024
to
adc9f2e
Compare
This comment was marked as resolved.
This comment was marked as resolved.
3bc839b
to
d727449
Compare
I've reviewed what I could imagine adding some more metadata validation on top, similar to what |
Is there any plan to merge this anytime soon or support metadata 2.4 in other ways? Our product uses |
Once #1123 lands in a release, metadata 2.4 will be supported whenever a newer (> 1.10) version of I would (personally) like to merge this soon as a fully general solution to the above. To whit: @dnicolodi would you be able to deconflict here? Once this is deconflicted I can do a full review pass. |
This is not entirely true: Furthermore, metadata 2.4 strongly discourages keeping license information in the package classifiers. If package authors follow this recommendation and upload to PyPI with twine their packages will not snow any licensing information on PyPI. This would be a considered a regression by all the package authors that were considering to be early adopters of metadata 2.4 (or that are forced to metadata 2.4 by build backend authors switching their backends to emit metadata 2.4 unconditionally, |
f81db94
to
0ef7cf6
Compare
Rebased, twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is pretty good.
I have a few comments above, mostly with a theme of "can we make this change less invasive?" I imagine this change could have been broken down into support for metadata 2.4, switching from pkginfo to packaging, and other unrelated changes.
Nitpicks are my preference, but not required.
Additionally:
- Add a changelog.
The changes are broken down into independent commits with the reason for each change explained in the commit message. I am unfamiliar with the development practices of this package. If the practice here is to squash the commits in each PR into a single commit, I can split the commits into separate PRs, if you prefer. |
cd7db27
to
c647408
Compare
9de60eb
to
26a0fac
Compare
That's fair. Sorry if I was too nitpicky. I didn't think to read the history of commits, but I probably could have for more context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for your hard work here @dnicolodi! One non-blocking comment.
525cbde
to
4b9ba34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintainers can probably comment on whether they think this too verbose, but I think this explains it very well.
Only looked at the project metadata and changelog. From outside perspective and an interested passer-by that LGTM! Thanks for picking this up on behalf of many within the ecosystem waiting on this :)
The packaging package is maintained by the PyPA and it is the de-facto reference implementation for the packaging standards. Using packaging for parsing metadata guarantees support for the latest metadata versions. warehouse, the Python package index implementation used by PyPI, also uses packaging for parsing metadata. This guarantees that metadata parsing is the same on the client and server side, for the most prominent index.
It was done in the support code for the wheel file format but it affects metadata loading from all supported distribution types. Move it to generic code.
comment: Optional[str] | ||
pyversion: str | ||
filetype: str | ||
gpg_signature: Tuple[str, bytes] | ||
attestations: str | ||
md5_digest: str | ||
sha256_digest: Optional[str] | ||
blake2_256_digest: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't actually metadata in the way the rest of these fields are. These are internal-ish data about the package file in question. In other words, there's a real and valuable distinction between "I parsed this from the package metadata" and "I generated this from the package file to be uploaded and/or its metadata". Blurring those lines may be convenient-ish, but long term will be confusing for new contributors, maintainers, etc. I don't believe these should live here. As I said, we can use packaging's typed dict as a collaborator here, but I won't agree to inheriting from it or doing something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sigmavirus24 for the review. However, I think there is a misunderstanding.
The type definition you are commenting on is defining the return type of the Package.metadata_dictionary()
:
Lines 178 to 182 in 1703ae7
def metadata_dictionary(self) -> Dict[str, MetadataValue]: | |
"""Merge multiple sources of metadata into a single dictionary. | |
Includes values from filename, PKG-INFO, hashers, and signature. | |
""" |
or in the proposed patch:
def metadata_dictionary(self) -> PackageMetadata:
"""Merge multiple sources of metadata into a single dictionary.
Includes values from filename, PKG-INFO, hashers, and signature.
"""
As you can see, the goal of the function is indeed to merge multiple sources of "metadata". This is existing code. The only thing that my patch does it to type the returned value as a TypedDict. I thought that using a TypedDict made things clearer and resolved some ambiguities regarding the data types stored in the dictionary. If adding typing is more confusing than helpful, I can remove it. I agree that the naming of the method is not the most clear: what is returned is the package submission data, not the package metadata, but, again, that is the existing name of the method. I named the TypedDict to match the method name. I can change that to something else, if we can find a more fitting name.
As I said, we can use packaging's typed dict as a collaborator here, but I won't agree to inheriting from it or doing something like this
Things are already structured like this: the Package
class has a metadata
member that holds the parsed metadata as returned by packaging
and the additional fields required for submission:
Lines 82 to 96 in 1703ae7
class PackageFile: | |
def __init__( | |
self, | |
filename: str, | |
comment: Optional[str], | |
metadata: CheckedDistribution, | |
python_version: Optional[str], | |
filetype: Optional[str], | |
) -> None: | |
self.filename = filename | |
self.basefilename = os.path.basename(filename) | |
self.comment = comment | |
self.metadata = metadata | |
self.python_version = python_version | |
self.filetype = filetype |
or in the proposed patch:
class PackageFile:
def __init__(
self,
filename: str,
comment: Optional[str],
metadata: metadata.RawMetadata,
python_version: str,
filetype: str,
) -> None:
self.filename = filename
self.basefilename = os.path.basename(filename)
self.comment = comment
self.metadata = metadata
self.python_version = python_version
self.filetype = filetype
However, at some point the different sources of data for the submission form need to be combined into a single data structure. The existing code uses a dictionary for that.
I hope that this clarification resolves the issue you have with the code structure. If not, I may be misunderstanding your comment. In this case I would like to point out that all the proposed patch does is to change the source of the metadata values. The combining into a single dictionary of all the value required for submission is existing code. If you prefer to structure this code differently, maybe you can change the existing code to a structure you like more, and I'll rebase my changes on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that does clarify things. Sorry for the confusion. I saw our new maintainers approval and could only review from my phone which makes it harder to follow some things. Sorry for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion.
No problem. Thank you for the review and for merging!
The
packaging
package is maintained by the PyPA and it is the de-facto reference implementation for the packaging standards. Usingpackaging
for parsing metadata guarantees support for the latest metadata versions.warehouse
, the Python package index implementation used by PyPI, also usespackaging
for parsing metadata. This guarantees that metadata parsing is the same on the client and server side, for the most prominent index.