Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to packaging for parsing metadata and support metadata 2.4 #1180

Merged
merged 7 commits into from
Dec 17, 2024

Conversation

dnicolodi
Copy link
Contributor

@dnicolodi dnicolodi commented Nov 22, 2024

The packaging package is maintained by the PyPA and it is the de-facto reference implementation for the packaging standards. Using packaging for parsing metadata guarantees support for the latest metadata versions.

warehouse, the Python package index implementation used by PyPI, also uses packaging for parsing metadata. This guarantees that metadata parsing is the same on the client and server side, for the most prominent index.

@dnicolodi

This comment was marked as outdated.

@dnicolodi dnicolodi force-pushed the packaging branch 3 times, most recently from 60ba024 to adc9f2e Compare November 24, 2024 01:02
@dnicolodi

This comment was marked as resolved.

@dnicolodi dnicolodi force-pushed the packaging branch 3 times, most recently from 3bc839b to d727449 Compare November 25, 2024 23:21
@dnicolodi dnicolodi changed the title Switch from pkginfo to packaging for parsing distribution metadata Switch to packaging for parsing metadata and support metadata 2.4 Nov 25, 2024
@dnicolodi
Copy link
Contributor Author

I've reviewed what warehouse expects in the form data and verified that the code here sends the right thing. This should be complete now. Can someone take a look to check that I did not miss anything? Having the CI jobs running on this PR would also be nice. Thanks!

I could imagine adding some more metadata validation on top, similar to what warehouse does, but it would require adding some dependencies.

@konn
Copy link

konn commented Nov 29, 2024

Is there any plan to merge this anytime soon or support metadata 2.4 in other ways?

Our product uses maturin as the build system and its new version 1.7.6 released today uses metadata version 2.4, and hence wheel generated by it doesn't check with the currently released version of twine 5.1.1.

@woodruffw
Copy link
Member

Is there any plan to merge this anytime soon or support metadata 2.4 in other ways?

Once #1123 lands in a release, metadata 2.4 will be supported whenever a newer (> 1.10) version of pkginfo is resolved. This isn't fully general unfortunately, however, since existing twine installations that are pip upgrade'd won't get their deps upgraded at the same time. This is mentioned in #1070.

I would (personally) like to merge this soon as a fully general solution to the above.

To whit: @dnicolodi would you be able to deconflict here? Once this is deconflicted I can do a full review pass.

@dnicolodi
Copy link
Contributor Author

Once #1123 lands in a release, metadata 2.4 will be supported whenever a newer (> 1.10) version of pkginfo is resolved.

This is not entirely true: twine will not error out anymore on metadata version 2.4, but it will not transmit the new license fields to the package index. In the case of PyPI this will result in the license information not shown on the package page on PyPI. As the goal of PEP 639 and metadata version 2.4 is to make licensing information more clear, not making the license information available from the package index IMHO does not count as supporting metadata 2.4.

Furthermore, metadata 2.4 strongly discourages keeping license information in the package classifiers. If package authors follow this recommendation and upload to PyPI with twine their packages will not snow any licensing information on PyPI. This would be a considered a regression by all the package authors that were considering to be early adopters of metadata 2.4 (or that are forced to metadata 2.4 by build backend authors switching their backends to emit metadata 2.4 unconditionally, maturin and pdm being two widely used examples).

@dnicolodi dnicolodi force-pushed the packaging branch 3 times, most recently from f81db94 to 0ef7cf6 Compare November 30, 2024 17:38
@dnicolodi
Copy link
Contributor Author

Rebased, twice.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is pretty good.

I have a few comments above, mostly with a theme of "can we make this change less invasive?" I imagine this change could have been broken down into support for metadata 2.4, switching from pkginfo to packaging, and other unrelated changes.

Nitpicks are my preference, but not required.

Additionally:

  • Add a changelog.

pyproject.toml Outdated Show resolved Hide resolved
tests/test_package.py Outdated Show resolved Hide resolved
tests/test_package.py Show resolved Hide resolved
tests/test_package.py Outdated Show resolved Hide resolved
tests/test_package.py Outdated Show resolved Hide resolved
twine/repository.py Outdated Show resolved Hide resolved
twine/repository.py Outdated Show resolved Hide resolved
twine/repository.py Outdated Show resolved Hide resolved
twine/repository.py Outdated Show resolved Hide resolved
twine/wheel.py Outdated Show resolved Hide resolved
@dnicolodi
Copy link
Contributor Author

I imagine this change could have been broken down into support for metadata 2.4, switching from pkginfo to packaging, and other unrelated changes.

The changes are broken down into independent commits with the reason for each change explained in the commit message. I am unfamiliar with the development practices of this package. If the practice here is to squash the commits in each PR into a single commit, I can split the commits into separate PRs, if you prefer.

@dnicolodi dnicolodi force-pushed the packaging branch 7 times, most recently from cd7db27 to c647408 Compare December 1, 2024 00:47
@dnicolodi dnicolodi force-pushed the packaging branch 6 times, most recently from 9de60eb to 26a0fac Compare December 14, 2024 22:47
@jaraco
Copy link
Member

jaraco commented Dec 15, 2024

I imagine this change could have been broken down into support for metadata 2.4, switching from pkginfo to packaging, and other unrelated changes.

The changes are broken down into independent commits with the reason for each change explained in the commit message. I am unfamiliar with the development practices of this package. If the practice here is to squash the commits in each PR into a single commit, I can split the commits into separate PRs, if you prefer.

That's fair. Sorry if I was too nitpicky. I didn't think to read the history of commits, but I probably could have for more context.

changelog/1180.misc.txt Outdated Show resolved Hide resolved
@jwodder jwodder mentioned this pull request Dec 16, 2024
2 tasks
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your hard work here @dnicolodi! One non-blocking comment.

twine/distribution.py Show resolved Hide resolved
changelog/1180.misc.txt Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
@dnicolodi dnicolodi force-pushed the packaging branch 2 times, most recently from 525cbde to 4b9ba34 Compare December 16, 2024 20:39
Copy link

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainers can probably comment on whether they think this too verbose, but I think this explains it very well.

Only looked at the project metadata and changelog. From outside perspective and an interested passer-by that LGTM! Thanks for picking this up on behalf of many within the ecosystem waiting on this :)

The packaging package is maintained by the PyPA and it is the de-facto
reference implementation for the packaging standards. Using packaging
for parsing metadata guarantees support for the latest metadata
versions.

warehouse, the Python package index implementation used by PyPI, also
uses packaging for parsing metadata. This guarantees that metadata
parsing is the same on the client and server side, for the most
prominent index.
It was done in the support code for the wheel file format but it
affects metadata loading from all supported distribution types. Move
it to generic code.
Comment on lines +155 to +162
comment: Optional[str]
pyversion: str
filetype: str
gpg_signature: Tuple[str, bytes]
attestations: str
md5_digest: str
sha256_digest: Optional[str]
blake2_256_digest: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't actually metadata in the way the rest of these fields are. These are internal-ish data about the package file in question. In other words, there's a real and valuable distinction between "I parsed this from the package metadata" and "I generated this from the package file to be uploaded and/or its metadata". Blurring those lines may be convenient-ish, but long term will be confusing for new contributors, maintainers, etc. I don't believe these should live here. As I said, we can use packaging's typed dict as a collaborator here, but I won't agree to inheriting from it or doing something like this

Copy link
Contributor Author

@dnicolodi dnicolodi Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sigmavirus24 for the review. However, I think there is a misunderstanding.

The type definition you are commenting on is defining the return type of the Package.metadata_dictionary():

twine/twine/package.py

Lines 178 to 182 in 1703ae7

def metadata_dictionary(self) -> Dict[str, MetadataValue]:
"""Merge multiple sources of metadata into a single dictionary.
Includes values from filename, PKG-INFO, hashers, and signature.
"""

or in the proposed patch:

    def metadata_dictionary(self) -> PackageMetadata:
        """Merge multiple sources of metadata into a single dictionary.

        Includes values from filename, PKG-INFO, hashers, and signature.
        """

As you can see, the goal of the function is indeed to merge multiple sources of "metadata". This is existing code. The only thing that my patch does it to type the returned value as a TypedDict. I thought that using a TypedDict made things clearer and resolved some ambiguities regarding the data types stored in the dictionary. If adding typing is more confusing than helpful, I can remove it. I agree that the naming of the method is not the most clear: what is returned is the package submission data, not the package metadata, but, again, that is the existing name of the method. I named the TypedDict to match the method name. I can change that to something else, if we can find a more fitting name.

As I said, we can use packaging's typed dict as a collaborator here, but I won't agree to inheriting from it or doing something like this

Things are already structured like this: the Package class has a metadata member that holds the parsed metadata as returned by packaging and the additional fields required for submission:

twine/twine/package.py

Lines 82 to 96 in 1703ae7

class PackageFile:
def __init__(
self,
filename: str,
comment: Optional[str],
metadata: CheckedDistribution,
python_version: Optional[str],
filetype: Optional[str],
) -> None:
self.filename = filename
self.basefilename = os.path.basename(filename)
self.comment = comment
self.metadata = metadata
self.python_version = python_version
self.filetype = filetype

or in the proposed patch:

class PackageFile:
    def __init__(
        self,
        filename: str,
        comment: Optional[str],
        metadata: metadata.RawMetadata,
        python_version: str,
        filetype: str,
    ) -> None:
        self.filename = filename
        self.basefilename = os.path.basename(filename)
        self.comment = comment
        self.metadata = metadata
        self.python_version = python_version
        self.filetype = filetype

However, at some point the different sources of data for the submission form need to be combined into a single data structure. The existing code uses a dictionary for that.

I hope that this clarification resolves the issue you have with the code structure. If not, I may be misunderstanding your comment. In this case I would like to point out that all the proposed patch does is to change the source of the metadata values. The combining into a single dictionary of all the value required for submission is existing code. If you prefer to structure this code differently, maybe you can change the existing code to a structure you like more, and I'll rebase my changes on top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that does clarify things. Sorry for the confusion. I saw our new maintainers approval and could only review from my phone which makes it harder to follow some things. Sorry for the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion.

No problem. Thank you for the review and for merging!

@sigmavirus24 sigmavirus24 merged commit 0605ef0 into pypa:main Dec 17, 2024
26 checks passed
dnicolodi added a commit to dnicolodi/twine that referenced this pull request Dec 19, 2024
This test file was added in 0605ef0 (Switch to packaging for
parsing metadata and support metadata 2.4, pypa#1180) but the tests added
in that commit used minimal zip archives generated on the fly to test
the same code paths.
dnicolodi added a commit to dnicolodi/twine that referenced this pull request Dec 19, 2024
This test file was added in 0605ef0 (Switch to packaging for
parsing metadata and support metadata 2.4, pypa#1180) but the tests added
in that commit used minimal zip archives generated on the fly to test
the same code paths.
dnicolodi added a commit to dnicolodi/twine that referenced this pull request Dec 19, 2024
This test file was added in 0605ef0 (Switch to packaging for
parsing metadata and support metadata 2.4, pypa#1180) but the tests added
in that commit used minimal zip archives generated on the fly to test
the same code paths.
dnicolodi added a commit to dnicolodi/twine that referenced this pull request Dec 26, 2024
This test file was added in 0605ef0 (Switch to packaging for
parsing metadata and support metadata 2.4, pypa#1180) but the tests added
in that commit used minimal zip archives generated on the fly to test
the same code paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants