Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeTable as a substitute for packing whole jars into META-INF/rewrite/classpath #4932

Merged
merged 9 commits into from
Jan 22, 2025

Conversation

jkschneider
Copy link
Member

@jkschneider jkschneider commented Jan 21, 2025

What's changed?

Introducing a new mechanism for recipes to have resource level dependencies on JARs when they need them for JavaTemplate classpaths. Type tables provide an inventory of essentially the ABI of whole classes and JARs, from which we can reconstitute enough bytecodes to be used in a later compilation step (such as a JavaTemplate use).

A TypeTable can be materialized into a classes directory containing sufficient bytecodes for a compiler to recognize the types contained in a JAR without having the actual JAR:

image

What's your motivation?

In rewrite-spring we've been gradually increasing the size of the recipe JAR over time by including more and more versions . Additionally, we see security scanners sometimes blocking our recipe JARs because they have inlined "vulnerable" JARs into the META-INF directory, even though these JARs would never be executed (the scanners can't tell the difference).

Anything in particular you'd like reviewers to focus on?

JavaParser#dependenciesFromResources will, for a time, still look for whole JARs in the /META-INF/rewrite/classpath folder in addition to these type tables.

I don't really love the name TypeTable, but haven't come up with a better one yet.

@jkschneider jkschneider changed the title TypeTable as a substitute for packing whole jars into META-INF/rewrite/classpath TypeTable as a substitute for packing whole jars into META-INF/rewrite/classpath Jan 21, 2025
}

@Test
void writeReadMicrometer() throws IOException {
Copy link
Member Author

@jkschneider jkschneider Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To best understand how TypeTable is used, start your review here. In the test we are grabbing micrometer from JavaParser.runtimeClasspath() as a quick way of getting to a JAR path, but in the rewrite-build-gradle-plugin we would be shoveling the JARs in the rewrite { parserClasspath(..) } extension into TypeTable.Writer.

github-actions[bot]

This comment was marked as off-topic.

jkschneider and others added 2 commits January 21, 2025 15:53
…r/TypeTableTest.java

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…r/TypeTableTest.java

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions[bot]

This comment was marked as off-topic.

github-actions[bot]

This comment was marked as off-topic.

…r/TypeTableTest.java

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions[bot]

This comment was marked as off-topic.

github-actions[bot]

This comment was marked as off-topic.

@shanman190
Copy link
Contributor

shanman190 commented Jan 22, 2025

This is honestly very similar to the ct.sym file that supports the Java compiler's --release flag without needing a bootstrap classpath anymore.
See https://www.morling.dev/blog/the-anatomy-of-ct-sym-how-javac-ensures-backwards-compatibility/

A question that comes to mind is if the sig file and directory structure illustrated from the JDK may be a little bit more scalable for the solution space here?

github-actions[bot]

This comment was marked as off-topic.

@jkschneider
Copy link
Member Author

jkschneider commented Jan 22, 2025

@shanman190 Interesting to learn about the ct.sym file. If I understand the content correctly, I think it actually may be less scalable for the solution space.

I see the type table eventually transitioning from a simple deflate TSV to a columnar file format like parquet, with the goal of building a type table for all of maven central, etc. For this same reason, I don't worry too much about column duplication because columnar file formats often expect high levels of duplication in a column.

I think our potential for large scale analytics on this type information is greater if we can pack this data into a tabular structure.

Materializing portions of a type table to bytecodes on disk is but one application of that dataset.

@shanman190
Copy link
Contributor

@jkschneider, that's certainly fair. The retained content could be simpler, but for a more true to form columnar table definitely works equally as well. I figured that I'd at least bring it up as an established pattern since the goal is very similar to each other.

@jkschneider
Copy link
Member Author

Absolutely! And thanks for making me aware of ct.sym. I didn't know :)

@jkschneider
Copy link
Member Author

jkschneider commented Jan 22, 2025

Just for curiosity, I "type tabled" the first 500 JARs in my m2 local cache:

  • Total size of table 18.092219 MiB
  • Total size of jars 519.862794 MiB

@shanman190
Copy link
Contributor

Thats not too bad! The JDK signature table is 7.2 MB when Gunnar wrote that blog post (JDK 16 at the time).

I know you mentioned TypeTable was a kind of odd name, but the only other thing that I could think of was SignatureTable or SymbolTable (this one's not semantically accurate though).

@jkschneider
Copy link
Member Author

Yeah I have a hard time getting the JVMS 4.3 definition of "signature" out of my mind now. Signature is only non-null when there are generic types involved. "Descriptor" always exists but doesn't contain the generic info. Doesn't seem like there's an overarching term that encompasses both.

github-actions[bot]

This comment was marked as off-topic.

@jkschneider jkschneider merged commit 7661418 into main Jan 22, 2025
2 checks passed
@jkschneider jkschneider deleted the type-tables branch January 22, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants