-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CALCITE-6465: Rework code generator to use Flink code splitter #3901
base: main
Are you sure you want to change the base?
Changes from 26 commits
6ac2d2e
c63821e
5556360
d3cf335
739c747
e9e20e1
dc783a9
b7553e4
f63b665
4216e80
2077d1e
fd2de3d
8f04243
dd5b995
8f62b88
1766b35
cd33092
dc1c66d
2ce8cac
6d545db
6200d82
39e14a4
d969ee4
232eb8e
2887c0d
fdd0d18
80c8587
4a258d8
c28f962
4f95aa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,7 +111,10 @@ public static Bindable toBindable(Map<String, Object> parameters, | |
parameters); | ||
|
||
final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer); | ||
String s = Expressions.toString(expr.memberDeclarations, "\n", false); | ||
String s = | ||
Expressions.toString(expr.memberDeclarations, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
"\n", false, | ||
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value()); | ||
|
||
if (CalciteSystemProperty.DEBUG.value()) { | ||
Util.debugCode(System.out, s); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -420,6 +420,18 @@ public final class CalciteSystemProperty<T> { | |
public static final CalciteSystemProperty<Integer> FUNCTION_LEVEL_CACHE_MAX_SIZE = | ||
intProperty("calcite.function.cache.maxSize", 0, v -> v >= 0); | ||
|
||
/** | ||
* The length of code before automatic method splitting is enabled | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name/description of the property isn't great. It doesn't include the units (characters, lines, number of statements). The javadoc doesn't say how the threshold is used - whether splitting occurs if a method is greater than or greater than or equal . Maybe a better name can be found, I don't know. But the doc should say something like 'A method is split if its size in characters is greater than or equal to the threshold and the threshold is non-negative.' The field in ExpressionWriter needs better doc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've renamed this to be MAX_METHOD_LENGTH_IN_CHARS_BEFORE_SPLITTING and clarified that splitting starts happening if the code is longer than the limit and not when equal to the limit. This is a long name though so I'm open to better names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the topic of the constant to indicate no limit, I've changed this to be -1 instead of 0. I don't see a good place to hold the magic number though -- CalciteSystemProperty would be a reasonable user-facing location, but ExpressionWriter can't reference it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made the -1 magic number a constant in Expressions |
||
* | ||
* <p>A value of <code>0</code>, the default, disables automatic method splitting. | ||
* | ||
* <p>Some queries can generate methods exceeding the JVM limit of 4000 characters per method. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you include a citation for your statement "JVM limit of 4000 characters per method"? Java 7 had a limit 65534 bytes (bytecode size). When did that change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got this 4K from Flink's documentation on the configuration option relating to this: So it's not exactly that the JVM has a limit of 4K, it's that the default JIT allows 8K only. Let me find a source for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's specified here in OpenJDK (other vendors use the same limit): https://github.com/openjdk/jdk11u-dev/blob/a754a3d8972abc67aee768d31a2dc2e8214274cf/src/hotspot/share/runtime/globals.hpp#L2391 It's considered a developer option though, rather than an end-user option, so documentation is limited. |
||
* Use this setting to automatically detect and split methods larger than the specified | ||
* limit. | ||
*/ | ||
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: proposal (to make the new property robust in case of negative values)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split". I think that any negative value should mean "don't split", and the default should be -1. |
||
intProperty("calcite.linq.method_splitting_threshold", 0); | ||
|
||
private static CalciteSystemProperty<Boolean> booleanProperty(String key, | ||
boolean defaultValue) { | ||
// Note that "" -> true (convenient for command-lines flags like '-Dflag') | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -186,7 +186,9 @@ static Scalar.Producer baz(ParameterExpression context_, | |
final ClassDeclaration classDeclaration = | ||
Expressions.classDecl(Modifier.PUBLIC, "Buzz", null, | ||
ImmutableList.of(Scalar.Producer.class), declarations); | ||
String s = Expressions.toString(declarations, "\n", false); | ||
String s = | ||
Expressions.toString(declarations, "\n", false, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The declarations here usually do not look like ClassDeclarations either. |
||
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value()); | ||
if (CalciteSystemProperty.DEBUG.value()) { | ||
Util.debugCode(System.out, s); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -150,7 +150,9 @@ static List<String> innodbFieldNames(final RelDataType rowType) { | |
InnodbMethod.INNODB_QUERYABLE_QUERY.method, fields, | ||
selectFields, cond, ascOrder)); | ||
if (CalciteSystemProperty.DEBUG.value()) { | ||
System.out.println("Innodb: " + Expressions.toString(enumerable)); | ||
System.out.println( | ||
"Innodb: " + Expressions.toString(enumerable, | ||
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value())); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. indentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
list.add(Expressions.return_(null, enumerable)); | ||
return implementor.result(physType, list.toBlock()); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,7 +51,8 @@ public Type getType() { | |
} | ||
|
||
@Override public String toString() { | ||
ExpressionWriter writer = new ExpressionWriter(true); | ||
// Use the JVM limit for method size (4K) as the method splitting threshold. | ||
ExpressionWriter writer = new ExpressionWriter(true, 4000); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that you can't use the system property. But why should the default behavior be to split? I think it should be to not split. If people want to split they should invoke by other means than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
accept(writer, 0, 0); | ||
return writer.toString(); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,15 +32,25 @@ class ExpressionWriter { | |
|
||
private final Spacer spacer = new Spacer(0); | ||
private final StringBuilder buf = new StringBuilder(); | ||
private boolean indentPending; | ||
private final boolean generics; | ||
private final int methodSplittingThreshold; | ||
private boolean indentPending; | ||
|
||
ExpressionWriter() { | ||
this(true); | ||
this(true, 0); | ||
} | ||
|
||
ExpressionWriter(boolean generics) { | ||
ExpressionWriter(boolean generics, int methodSplittingThreshold) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need javadoc for this constructor. You don't say the units of the threshold (bytes, lines), that it applies to source code length, what values disable splitting, and whether it's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Expanded comments and renamed this parameter. |
||
this.generics = generics; | ||
this.methodSplittingThreshold = methodSplittingThreshold; | ||
} | ||
|
||
public ExpressionWriter duplicateState() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. public method needs javadoc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made this package-private rather than adding javadoc -- it really seems like most of the methods in ExpressionWriter can be package-private. |
||
final ExpressionWriter writer = | ||
new ExpressionWriter(this.generics, this.methodSplittingThreshold); | ||
writer.indentPending = this.indentPending; | ||
writer.spacer.add(this.spacer.get()); | ||
return writer; | ||
} | ||
|
||
public void write(Node expression) { | ||
|
@@ -73,6 +83,14 @@ public boolean requireParentheses(Expression expression, int lprec, | |
return true; | ||
} | ||
|
||
public boolean usesMethodSplitting() { | ||
return methodSplittingThreshold != 0; | ||
} | ||
|
||
public int getMethodSplittingThreshold() { | ||
return methodSplittingThreshold; | ||
} | ||
|
||
/** | ||
* Increases the indentation level. | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,7 +59,17 @@ private Expressions() {} | |
*/ | ||
public static String toString(List<? extends Node> expressions, String sep, | ||
boolean generics) { | ||
final ExpressionWriter writer = new ExpressionWriter(generics); | ||
return toString(expressions, sep, generics, 0); | ||
} | ||
|
||
/** | ||
* Converts a list of expressions to Java source code, optionally emitting | ||
* extra type information in generics. Optionally splits the generated method if | ||
* the method is too large. | ||
*/ | ||
public static String toString(List<? extends Node> expressions, String sep, | ||
boolean generics, int methodSplittingThreshold) { | ||
final ExpressionWriter writer = new ExpressionWriter(generics, methodSplittingThreshold); | ||
for (Node expression : expressions) { | ||
writer.write(expression); | ||
writer.append(sep); | ||
|
@@ -71,7 +81,15 @@ public static String toString(List<? extends Node> expressions, String sep, | |
* Converts an expression to Java source code. | ||
*/ | ||
public static String toString(Node expression) { | ||
return toString(Collections.singletonList(expression), "", true); | ||
return toString(expression, 0); | ||
} | ||
|
||
/** | ||
* Converts an expression to Java source code. | ||
* Optionally splits the generated method if the method is too large. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "too large" is vague. can do better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed this to be more specific about relating to the limit. |
||
*/ | ||
public static String toString(Node expression, int methodSplittingTheshold) { | ||
return toString(Collections.singletonList(expression), "", true, methodSplittingTheshold); | ||
} | ||
|
||
/** | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,24 +21,29 @@ | |
import org.apache.calcite.linq4j.tree.BlockStatement; | ||
import org.apache.calcite.linq4j.tree.Blocks; | ||
import org.apache.calcite.linq4j.tree.ClassDeclaration; | ||
import org.apache.calcite.linq4j.tree.ConditionalStatement; | ||
import org.apache.calcite.linq4j.tree.DeclarationStatement; | ||
import org.apache.calcite.linq4j.tree.Expression; | ||
import org.apache.calcite.linq4j.tree.Expressions; | ||
import org.apache.calcite.linq4j.tree.FieldDeclaration; | ||
import org.apache.calcite.linq4j.tree.FunctionExpression; | ||
import org.apache.calcite.linq4j.tree.MethodCallExpression; | ||
import org.apache.calcite.linq4j.tree.MethodDeclaration; | ||
import org.apache.calcite.linq4j.tree.NewExpression; | ||
import org.apache.calcite.linq4j.tree.Node; | ||
import org.apache.calcite.linq4j.tree.ParameterExpression; | ||
import org.apache.calcite.linq4j.tree.Shuttle; | ||
import org.apache.calcite.linq4j.tree.Types; | ||
|
||
import org.apache.commons.lang3.StringUtils; | ||
|
||
import com.google.common.collect.ImmutableList; | ||
import com.google.common.collect.ImmutableMap; | ||
import com.google.common.collect.ImmutableSet; | ||
import com.google.common.collect.Sets; | ||
|
||
import org.checkerframework.checker.nullness.qual.Nullable; | ||
import org.junit.jupiter.api.Assertions; | ||
import org.junit.jupiter.api.Test; | ||
import org.junit.jupiter.api.io.TempDir; | ||
|
||
|
@@ -1719,6 +1724,60 @@ public void checkBlockBuilder(boolean optimizing, String expected) { | |
Expressions.toString(Expressions.constant(set))); | ||
} | ||
|
||
@Test void testLargeMethod() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test needs good javadoc. What is MyClass and why is it commented out? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarified the meaning of the MyClass comment block and added to the header public javadoc about the test + JIRA. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that there is only one test in the whole of calcite that sues method-splitting. Need a test where splitting is set to a low positive value, e.g. 3. Just to make sure that the splitter handles gracefully the case where it's impossible to split. Is it possible to also use method-splitting in one or two of Calcite's system tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added the test described above and also added a test to ensure method splitting does not happen when methods remain under the limit. I'm not familiar with Calcite's system tests. Would they be under the core module? |
||
// public class MyClass { | ||
// public void hugeMethod() { | ||
// if (true) { | ||
// // Long variable name declaration 1; | ||
// } else if (false) { | ||
// // Long variable name declaration 2; | ||
// } else if (false) { | ||
// // Long variable name declaration 3; | ||
// } else if (false) { | ||
// // Long variable name declaration 4; | ||
// } else | ||
// // Long variable name declaration 4; | ||
// } | ||
// } | ||
|
||
// Generate long variable names. | ||
final String longName = StringUtils.repeat('a', 5000); | ||
|
||
DeclarationStatement longDecl = | ||
Expressions.declare(0, longName, Expressions.constant(10)); | ||
final ConditionalStatement ifThenElse = | ||
Expressions.ifThenElse( | ||
Expressions.constant(true), | ||
longDecl, | ||
Expressions.constant(false), | ||
longDecl, | ||
Expressions.constant(false), | ||
longDecl, | ||
longDecl); | ||
|
||
final MethodDeclaration hugeMethod = | ||
Expressions.methodDecl(Modifier.PUBLIC, | ||
Void.TYPE, | ||
"hugeMethod", | ||
Collections.emptyList(), | ||
Blocks.toFunctionBlock(ifThenElse)); | ||
|
||
final ClassDeclaration classDecl = | ||
new ClassDeclaration(Modifier.PUBLIC, | ||
"MyClass", | ||
null, | ||
ImmutableList.of(), | ||
ImmutableList.of(hugeMethod)); | ||
|
||
final String originalClass = Expressions.toString(classDecl); | ||
final String classWithMethodSplitting = Expressions.toString(classDecl, 4000); | ||
|
||
// Some local variables in hugeMethod have been extracted to be fields named local$<number> in | ||
// the enclosing class definition. | ||
Assertions.assertFalse(originalClass.contains("local$")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
Assertions.assertTrue(classWithMethodSplitting.contains("local$")); | ||
} | ||
|
||
/** An enum. */ | ||
enum MyEnum { | ||
X, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note above this list of declarations that we import flink-table-code-splitter but we must not import core flink modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.