Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Additional Metadata Attributes #12174

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions proposed/2022/Additional-Metadata-Attributes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Additional Metadata Attributes

- Author Name [@redth](https://github.com/redth)
- Start Date (2022-10-190)
- GitHub Issue (GitHub Issue link)
- GitHub PR (GitHub PR link)

## Summary

Allow the inclusion of additional metadata properties in package authoring and allow them to be used in search queries.

## Motivation

MAUI (and previously Xamarin) has an ecosystem of 'binding' library NuGet packages which surface native platform library usage within apps through C# API projections. These native platform libraries are often available on their native package management system counterparts (ie: Maven, CocoaPods, etc.). Typically binding nuget packages redistribute the native libraries within them.

It's currently challenging/impossible to discern which NuGet packages map to a given native platform library package identity.

Moving the ecosystem forward, it would be very helpful to be able to programmatically determine the native platform library identities to:

1. Deduplicate inclusion of packages through transitive dependencies (which often result in native toolchain build errors if not resolved)
2. Discover eligible packages to fulfill dependencies of known native platform libraries (ie: integrating the output of a gradle build and automatically matching up / suggesting existing packages for dependencies)

Identity can consist of multiple attributes, for example a Maven package has:

- Group Id (eg: `com.company.product`)
- Artifact Id (eg: `ProductSdk`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, is this spec suggesting the same model as Maven? I'm wondering if extensions or next steps for the Maven model ever came up. For example: what about values with non-string types allowing range queries ("os_version:10" and querying "os_version>=10") or allowing multiple attributes with the same key but different values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the question here, could you elaborate? It might be nice to support additional types and query operators if that's the basic question? I guess if something is being considered, may as well consider all possible useful cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion.

From your spec, it sounds like Maven already has a feature like what you're suggesting. Given that, can we learn from any growing pains their ecosystem had? For example, did they originally ship with the string KVP model with unique keys, then run into problems that necessitated a richer data model?

Said another way, if we can learn from another ecosystem's implementation in this area, we can maybe skip some intermediate steps or painful migrations. Or we can know that what we're proposing here is actually enough.

I'm not clued into the Maven ecosystem so I can't provide that perspective.

As a side note, if there are "prior art" design spec/docs about this feature in other ecosystems that would be cool to link in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I'm not really that familiar with how they built up their model.

Having the ability to query on the version as a version number would be useful in an "AND" query scenario, I had just considered the MVP implementation of this not needing to do that since the result set matching only the GroupId and ArtifactId would presumably be small enough to iterate over versions of the results (also if it wasn't clear, different NuGet package versions would potentially have different attr_MavenVersion:1.2.3 values). The one potential gotcha here is that Maven's versioning rules might be a bit different than NuGet (though I think semver is adopted there too).

One other consideration though is the maven version might be used to assert if it satisfies a Maven version range - again, similar to NuGet's version range semantics, but not necessarily identical in rules/implementation. For the binding helpers project/experiment I linked in the proposal, this is part of the process, so in this case the matching GroupId/ArtifactId results would still need to be iterated over, asserting each version's maven range compatibility. Long way to say that there's maybe too many operators to consider for querying by version to make the effort of adding some simple >= particularly useful in value? This is just one example though, and maybe that would be valuable for other scenarios.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for coming late to the show.

I maintain:

And when I get some air I work on "bindings improvements" which should improve productivity and more.

So, I will express my opinion only for Android (.NET for Android - formerly Xamarin.Android), though
IMO this should be extended to .NET for iOS and maybe other platforms.

"Bindings improvements" include

I am already using some of the utilities in our repos for

Up to recently we have added Maven fully qualified metadata for artifact in 2 forms:

  • artifact=androidx.compose.material:material-ripple
  • artifact_versioned=androidx.compose.material:material-ripple:1.0.5

to nuget fields

  • Description
  • Summary (sometimes)
  • Tags

Visible here:

https://api.nuget.org/v3/registration5-gz-semver2/xamarin.androidx.compose.material.ripple/index.json

This was OK and I am able to use server side NuGet protocol (HttpClient + JSON/XML parsing) to increase productivity of maintenance on both of repos.

Last updates .NET for Android team decided to keep this information only in Tags node.

  • identifying binaries used (either distributed or downloaded by the package during the build)

  • dependency identification

    • type
      • maven
      • native
    • identity
    • version

With this there would be 1:1 mapping from NuGet package (versioned) to Maven/Native package (versioned)
This would help maintainers with

  1. keeping track of published (bound Maven or native libraries),

    Getting data for latest nuget package and mapping it to maven fully qualified versioned id
    would ease discoverability what is to be updated.

  2. updates and

    see 1.

  3. troubleshooting

    Primarily checking dependency graphs

    • for duplicate transitive dependencies (possibly with different versions)
  4. security checks (component governance)

  5. curation (currated package publishing)

    With lowering the bar for bindings via "bindings improvements" it is to be expected to have
    flood of bindings packages.
    NuGet publishing proces could add step to verify if given Maven/Native artifact is already
    published in some other NuGet package.

  • Maven

    • project

      <ItemGroup>
          <PackageAttribute Include="maven.GroupId" Value="androidx.activity" />
          <PackageAttribute Include="maven.ArtifactId" Value="activity" />
          <PackageAttribute Include="maven.VersionId" Value="1.6.0" />
      </ItemGroup>

      NOTE: this could be derived from curernt (and future) .NET for Android (Xamarin.Android)
      BuildActions for binding artifacts (Embedd)

    • nuspec

      <!--
          ... snip
      -->
      <package>
          <metadata>
              <attributes>
              <attribute key="maven.GroupId">androidx.activity</attribute>
              <attribute key="maven.ArtifactId">activity</attribute>
              <attribute key="maven.Version">1.6.0</attribute>
              </attributes>
          </metadata>
      <!--
          ... snip
      -->
      </package>
  • Native

    • project (packaging)

      <ItemGroup>
          <AndroidNativeLibrary Include="path/to/libfoo.so">
              <Abi>armeabi</Abi>
          </AndroidNativeLibrary>
      </ItemGroup>
    • nuspec

      <package>
          <metadata>
              <attributes>
              <attribute key="native.LibraryName">libfoo</attribute>
              <attribute key="native.Version">1.6.0</attribute>
              </attributes>
          </metadata>
      </package>

- Version (eg: `1.3.0`)

While a useful consumer of the search service might query by (GroupId='com.company.product' && ArtifactId='ProductSdk') in order to return a list of packages to further inspect version attributes of to determine if any package identities satisfy its requirements.

## Explanation

The NuGet search API already allows the specification of various package [metadata fields to search by in the query parameter](https://learn.microsoft.com/en-us/nuget/consume-packages/finding-and-choosing-packages#search-syntax). This proposal is simply an extension of that existing query syntax to include additional, potentially arbitrary attributes both in the .nuspec format as well as the search query.

### Functional explanation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any considerations beyong discoverability?
Maybe something specific to management some of these related packages within your project or is that not a big concern?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any considerations beyong discoverability?

1:1 mapping of native (maven or native lib) to nuget would make

  • security checks easier
  • curation (optional) easier

Maybe something specific to management some of these related packages within your project or is that not a big concern?

We do that, but formal/standardized/central method would help, both us and (IMO) nuget team.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking a bit deeper 1:1 is oversimplification. In cross platform scenario there will be artifact per platform (Android, iOS) and sometimes multiple artifacts per platform.


Package authors could choose to include arbitrary attribute key/value pairs within the NuGet packages they publish which would be contained within the .nuspec.

These attribute key/value pairs would be searchable within the NuGet search service, via the query property, similar to how search by `owner` or `packageid` is available currently.

### Technical explanation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone wants to filter by multiple key-value pairs, is that possible? AND or OR behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I hadn't realized, there does not seem to be a way to "AND" query terms... This would definitely be helpful, though I guess narrowing search results to a potentially reasonable number (ie: GroupId, or having a concatenated MavenId field) and inspecting the details of the results might be reasonable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuget.org search is less concern (for me)

https://www.nuget.org/packages?q=artifact

Huge help would be metadata + API for maintaining 450+ artifacts.


Inside of the .nuspec file's `<package>` and then `<metadata>` elements, create a new `<attributes>` element which can contain zero or more `<attribute key="[string]" value="[string]" />` elements. Attributes must have unique key values and there cannot have more than one attribute with the same key value.

In the NuGet search query (`q`) parameter, allow attributes to be specified as a query filter just like `owner`. That is, for example: `q=attr_[keyValue]:[attribute_value]` where the `attr_` prefix denotes matching a particular attribute key by its `[attribute_value]`. The search should look for exact, case-insensitive matches.
Copy link
Member

@joelverhagen joelverhagen Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuff inside the q (search text) parameter is not spec'd at the protocol level. So different package source implementations will interpret this differently. AND/OR logic, ranking, quote behavior, supported field-scoped terms, etc. These are all package source specific. In general the q property is for search relevance and less about strict package filtering. It's certainly a grey area since it is unspec'd but I think a safer approach is to introduce a new query parameter for these attribute filters.


## Drawbacks

- Attribute key/value pair sprawl

## Rationale and alternatives

While there are no known alternatives, we have previously considered embedding custom files in the package containing this metadata. This would be of some benefit, but ultimately supporting search queries is necessary for achieving the full benefit of the proposal for the scenarios described.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to create well-known tags for this purpose? We've extension points in the past, based on tag. Here's a prototype I tried on our DEV environment:
https://dev.nugettest.org/packages?q=Tags%3A%22attr_fruit%3Alemon%22

Miraculously, our quotes actually work properly here 😂.

Prior art is "AzureSiteExtension" used for finding Azure site extension packages, before the package type filtering was enabled:
https://www.nuget.org/packages?q=Tags%3A%22AzureSiteExtension%22

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-known tags could potentially work... I suppose there's not really any more enforced convention with arbitrary metadata key/value pairs... Though it doesn't seem like there would be currently a way to search with multiple tag combinations with "AND" (eg: Tags:"attr_fruit:lemon" Tags:"ArtifactId:NONE" returns results matching just one of the tags, not results only matching both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the AND/OR combination on NuGet.org search is ... not great :)

For non-field-scoped terms foo bar it performs an AND. For multiple field-scoped terms tags:foo tags:bar it performs an OR. For a mixture, at least one of the field-scoped terms per field name must exist in the doc.
https://www.nuget.org/packages?q=owner%3Amicrosoft+owner%3Ajver+tags%3Aentity+tags%3Afoo+design

I think a reasonable step here could be to change the interaction of field-scoped terms to "AND" to unblock this scenario. I think it would be a net win for general usage of field-scoped terms anyway since it would align with the non-field scoped term behavior.

The history here is that we have invested heavily in relevance on non-field scoped queries since they are the 99% case. We have not done the same investment for field-scoped queries or other advanced syntax like + (not supported), - (not supported), " (acts weird).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How valuable/important would the combination of atributes/tags be?

How many scenarios would a single attribute/tag solve?


## Prior Art

### Component Governance / SBOM
Currently CG build tasks in Azure DevOps do not know how to link a NuGet package binding library to the native artifacts they project / redistribute. We typically provide a [cgmanifest.json](https://github.com/xamarin/XamarinComponents/blob/main/Android/Guava/cgmanifest.json) file to track this relationship, however if there was metadata in NuGet packages, the CG tasks and SBOM generation could be enhanced to automatically pick this up.

### Xamarin.Binding.Helpers
There's a [couple](https://github.com/Redth/Xamarin.Binding.Helpers) [experiments](https://github.com/Redth/Microsoft.Maui.Platform.Channels) around making it easier to create bindings and integrations with platform native libraries. One of the challenges in creating tooling and experiences around this is acquiring native artifacts and linking them into .NET apps/builds, resolving conflicts between native toolchain dependencies and nuget package dependencies in existing apps.

Being able to cross-link native dependency identities against existing nuget package references would help in creating experiences that automatically resolve and link in the correct set of build time dependencies across native and nuget assets.

Example: Maintaining a [list of popular known packages that map to maven artifacts](https://github.com/Redth/Xamarin.Binding.Helpers/blob/main/Xamarin.Binding.Helpers/NuGetResolvers/KnownMavenNugetResolver.cs#L12-L99) is not a scalable solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for NuGet.org or a community member to build their own index based on NuGet.org packages using the V3 catalog. There is a guide on using the V3 catalog here: https://learn.microsoft.com/en-us/nuget/guides/api/query-for-all-published-packages

So the "productionized" version of this map would be to write catalog reader that looks at each published package, checks if there is a cgmanifest.json, then add it to an index. Surface the index on an independent web service. This allows custom projects/views of NuGet.org without the need to block on official service or client support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I didn't realize this was available... We don't currently publish the manifest in the packages (I don't think anyway), but looking at this sort of approach might mean we could create our own conventions - the problem is that unless they are 'officially' supported, conventions can be hard to gain traction with.



## Unresolved Questions

- How many different attribute values is reasonable for packages to contain? for the nuget service to index?
- Should attribute keys require some sort of approval before they can be used?

## Future Possibilities

- Better interoperability of native Apple and Android projects with .NET ecosystem
- Component Governance / SBOM build task automation