-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CALCITE-6465: Rework code generator to use Flink code splitter #3901
base: main
Are you sure you want to change the base?
Conversation
This builds but introduces a dependency on flink-core. Looking for a way around this. |
linq4j/src/main/java/org/apache/calcite/linq4j/tree/MethodDeclaration.java
Outdated
Show resolved
Hide resolved
Integrate Flink Java code splitter and run it on all final Expressions when converting to Java code. Next step is to run it based on configuration options
JavaCodeSplitter needs it for Preconditions
@@ -51,7 +51,7 @@ public Type getType() { | |||
} | |||
|
|||
@Override public String toString() { | |||
ExpressionWriter writer = new ExpressionWriter(true); | |||
ExpressionWriter writer = new ExpressionWriter(true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would read from the CalciteSystemProperty but calcite-core depends on linq4j so it can't be looked up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this have any practical impact on the actual code generation? (like splitting in some circumstances where the CalciteSystemProperty has not been set to true). If not, I'd say this is acceptable.
It's true that it is a bit unfortunate that we cannot check the CalciteSystemProperty here, but I can't think of a better solution at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would read from the CalciteSystemProperty but calcite-core depends on linq4j so it can't be looked up here.
I do not know this code, but wonder if the split configuration values be passed by value into the linq4j layer. If possible , this would avoid the dependancy recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this toString() method is used in code generation, just logging/debugging.
This option is passed in to Expressions static functions, which is really where ExpressionWriters are instantiated most of the time.
linq4j/src/main/java/org/apache/calcite/linq4j/tree/Expressions.java
Outdated
Show resolved
Hide resolved
@@ -57,20 +60,39 @@ public MethodDeclaration(int modifier, String name, Type resultType, | |||
} | |||
|
|||
@Override public void accept(ExpressionWriter writer) { | |||
String modifiers = Modifier.toString(modifier); | |||
writer.append(modifiers); | |||
final ExpressionWriter writerForUnsplitMethod = writer.usesMethodSplitting() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression the intent of this line is not very obvious, perhaps a comment would help to clarify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in dc1c66d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jduo , just a small comment, because I find the comment a bit confusing (maybe I'm getting it wrong), where it says:
... serialize the method declaration directly to the supplied writer if method splitting is enabled or ...
shouldn't it say "disabled" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct. I've updated this comment, but note that it is now in ClassDeclaration instead of MethodDeclaration. The method splitter requires an enclosing class to write some of its output too (it can extract variable names to instance variables and move blocks of code to generated private methods to reduce the size of a method).
linq4j/src/main/java/org/apache/calcite/linq4j/tree/Expressions.java
Outdated
Show resolved
Hide resolved
- Add comments about method splitting - Add overloads with default values - Clarify condition about using a temporary writer
Method splitting has to happen at the class level so that there is a place to write fields and methods generated by reducing the size of a method
Change the property ENABLE_METHOD_SPLITTING to METHOD_SPLITTING_THRESHOLD. It is now an integer specifying the method length limit before splitting instead of a boolean to turn on a fixed limit. A value of zero indicates that method splitting is off. Change tests unrelated to method splitting to set method splitting off.
* Use this setting to automatically detect and split methods larger than the specified | ||
* limit. | ||
*/ | ||
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: proposal (to make the new property robust in case of negative values)
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
intProperty("calcite.linq.method_splitting_threshold", 0, v -> v >= 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split".
I think that any negative value should mean "don't split", and the default should be -1.
@@ -111,7 +111,10 @@ public static Bindable toBindable(Map<String, Object> parameters, | |||
parameters); | |||
|
|||
final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer); | |||
String s = Expressions.toString(expr.memberDeclarations, "\n", false); | |||
String s = | |||
Expressions.toString(expr.memberDeclarations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put "\n", false,
on previous line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Use this setting to automatically detect and split methods larger than the specified | ||
* limit. | ||
*/ | ||
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split".
I think that any negative value should mean "don't split", and the default should be -1.
@@ -420,6 +420,18 @@ public final class CalciteSystemProperty<T> { | |||
public static final CalciteSystemProperty<Integer> FUNCTION_LEVEL_CACHE_MAX_SIZE = | |||
intProperty("calcite.function.cache.maxSize", 0, v -> v >= 0); | |||
|
|||
/** | |||
* The length of code before automatic method splitting is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name/description of the property isn't great. It doesn't include the units (characters, lines, number of statements). The javadoc doesn't say how the threshold is used - whether splitting occurs if a method is greater than or greater than or equal .
Maybe a better name can be found, I don't know.
But the doc should say something like 'A method is split if its size in characters is greater than or equal to the threshold and the threshold is non-negative.'
The field in ExpressionWriter needs better doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed this to be MAX_METHOD_LENGTH_IN_CHARS_BEFORE_SPLITTING and clarified that splitting starts happening if the code is longer than the limit and not when equal to the limit. This is a long name though so I'm open to better names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of the constant to indicate no limit, I've changed this to be -1 instead of 0. I don't see a good place to hold the magic number though -- CalciteSystemProperty would be a reasonable user-facing location, but ExpressionWriter can't reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the -1 magic number a constant in Expressions
@@ -51,7 +51,8 @@ public Type getType() { | |||
} | |||
|
|||
@Override public String toString() { | |||
ExpressionWriter writer = new ExpressionWriter(true); | |||
// Use the JVM limit for method size (4K) as the method splitting threshold. | |||
ExpressionWriter writer = new ExpressionWriter(true, 4000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you can't use the system property. But why should the default behavior be to split? I think it should be to not split.
If people want to split they should invoke by other means than toString()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1719,6 +1724,60 @@ public void checkBlockBuilder(boolean optimizing, String expected) { | |||
Expressions.toString(Expressions.constant(set))); | |||
} | |||
|
|||
@Test void testLargeMethod() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test needs good javadoc.
What is MyClass and why is it commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the meaning of the MyClass comment block and added to the header public javadoc about the test + JIRA.
@@ -111,6 +111,7 @@ dependencies { | |||
apiv("org.apache.commons:commons-pool2") | |||
apiv("org.apache.commons:commons-collections4") | |||
apiv("org.apache.commons:commons-text") | |||
apiv("org.apache.flink:flink-table-code-splitter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note above this list of declarations that we import flink-table-code-splitter but we must not import core flink modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
System.out.println("Innodb: " + Expressions.toString(enumerable)); | ||
System.out.println( | ||
"Innodb: " + Expressions.toString(enumerable, | ||
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
ExpressionWriter(boolean generics) { | ||
ExpressionWriter(boolean generics, int methodSplittingThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need javadoc for this constructor.
You don't say the units of the threshold (bytes, lines), that it applies to source code length, what values disable splitting, and whether it's >
or >=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded comments and renamed this parameter.
this.methodSplittingThreshold = methodSplittingThreshold; | ||
} | ||
|
||
public ExpressionWriter duplicateState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public method needs javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this package-private rather than adding javadoc -- it really seems like most of the methods in ExpressionWriter can be package-private.
|
||
/** | ||
* Converts an expression to Java source code. | ||
* Optionally splits the generated method if the method is too large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"too large" is vague. can do better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to be more specific about relating to the limit.
Quality Gate passedIssues Measures |
@rubenada @julianhyde , just wondering if there was some more feedback on this and would like some clarification on where system tests reside within Calcite. Thanks. |
@jduo , not sure what you mean by "system tests". If you want to add a test with a large dynamic code that fails when this feature is turn off / passes when it is turned on; maybe you could check and adapt the test that was done for the slightly related CALCITE-3094 (and then refined in CALCITE-6593 ): |
@@ -111,7 +111,9 @@ public static Bindable toBindable(Map<String, Object> parameters, | |||
parameters); | |||
|
|||
final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer); | |||
String s = Expressions.toString(expr.memberDeclarations, "\n", false); | |||
String s = | |||
Expressions.toString(expr.memberDeclarations, "\n", false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that code split is not effective here. This is because the code splitting logic in this pr is implemented in ClassDeclaration, and the expr.memberDeclarations here is usually MethodDeclaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asdfgh19 , thanks for your comments. Would you be able to add tests for your scenarios to the PR?
For this one, it seems like we a can pass expr to Expressions.toString(), but I'm not sure what effect that has on the calls that depend on s ( Hook.JAVA_PLAN.run(s);
)
@@ -186,7 +186,9 @@ static Scalar.Producer baz(ParameterExpression context_, | |||
final ClassDeclaration classDeclaration = | |||
Expressions.classDecl(Modifier.PUBLIC, "Buzz", null, | |||
ImmutableList.of(Scalar.Producer.class), declarations); | |||
String s = Expressions.toString(declarations, "\n", false); | |||
String s = | |||
Expressions.toString(declarations, "\n", false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The declarations here usually do not look like ClassDeclarations either.
@@ -100,7 +100,9 @@ private static String compile(RexBuilder rexBuilder, List<RexNode> constExps, | |||
Expressions.methodDecl(Modifier.PUBLIC, Object[].class, | |||
BuiltInMethod.FUNCTION1_APPLY.method.getName(), | |||
ImmutableList.of(root0_), blockBuilder.toBlock()); | |||
String code = Expressions.toString(methodDecl); | |||
String code = | |||
Expressions.toString(methodDecl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methodDecl here is a MethodDeclaration, not a ClassDeclaration, so it seems that the code splitting logic implemented in the ClassDeclaration does not work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this code relates to RexExecutable. The method body is generated here and written to String, then gets attached as the body of a class in RexExecutable that's run directly in Janino's IClassBodyEvaluator. This will be quite complicated to rework.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
No description provided.