Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CALCITE-6465: Rework code generator to use Flink code splitter #3901

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
6ac2d2e
WIP: Integrate Flink code splitter
jduo Aug 2, 2024
c63821e
Remove exclusion of flink-core
jduo Aug 2, 2024
5556360
Fix format violations
jduo Aug 21, 2024
d3cf335
Refactor so that only methods get code-split
jduo Aug 21, 2024
739c747
Remove extra newline
jduo Aug 21, 2024
e9e20e1
Update calls to Expressions.toString()
jduo Sep 10, 2024
dc783a9
Merge fixes
jduo Sep 18, 2024
b7553e4
Fix whitespace differences in generated code
jduo Sep 19, 2024
f63b665
Checker framework
jduo Sep 19, 2024
4216e80
Autostyle
jduo Sep 19, 2024
2077d1e
Parameterize use of method splitting
jduo Sep 19, 2024
fd2de3d
Enable method splitting in linq4j tests
jduo Sep 20, 2024
8f04243
Make method-splitting in library code based on a CalciteSystemProperty
jduo Sep 20, 2024
dd5b995
Fix commented-out code
jduo Sep 20, 2024
8f62b88
Revert unintended change around assert
jduo Sep 20, 2024
1766b35
Revert unintended change
jduo Sep 20, 2024
cd33092
Fix merge issues
jduo Sep 20, 2024
dc1c66d
PR feedback
jduo Sep 24, 2024
2ce8cac
Turn off method-splitting for EnumUtils tests
jduo Sep 25, 2024
6d545db
Make method splitting happen on class declarations
jduo Sep 27, 2024
6200d82
Simplify the test case
jduo Sep 27, 2024
39e14a4
Fix typo in comment
jduo Sep 28, 2024
d969ee4
Make configuration of method splitting a threshold instead of a toggle
jduo Sep 28, 2024
232eb8e
Revert accidental change
jduo Sep 28, 2024
2887c0d
Fix some bugs introduced during refactoring
jduo Sep 28, 2024
fdd0d18
Fix Javadoc linter errors
jduo Sep 30, 2024
80c8587
PR feedback
jduo Oct 3, 2024
4a258d8
Clarify testLargeMethod
jduo Oct 4, 2024
c28f962
Add comments about circular dependency concerns
jduo Oct 4, 2024
4f95aa0
Expand on method-splitting tests
jduo Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions bom/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ dependencies {
apiv("org.apache.commons:commons-pool2")
apiv("org.apache.commons:commons-collections4")
apiv("org.apache.commons:commons-text")
apiv("org.apache.flink:flink-table-code-splitter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note above this list of declarations that we import flink-table-code-splitter but we must not import core flink modules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

apiv("org.apache.geode:geode-core")
apiv("org.apache.hadoop:hadoop-client", "hadoop")
apiv("org.apache.hadoop:hadoop-common", "hadoop")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,10 @@ public static Bindable toBindable(Map<String, Object> parameters,
parameters);

final ClassDeclaration expr = relImplementor.implementRoot(rel, prefer);
String s = Expressions.toString(expr.memberDeclarations, "\n", false);
String s =
Expressions.toString(expr.memberDeclarations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put "\n", false, on previous line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"\n", false,
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value());

if (CalciteSystemProperty.DEBUG.value()) {
Util.debugCode(System.out, s);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,18 @@ public final class CalciteSystemProperty<T> {
public static final CalciteSystemProperty<Integer> FUNCTION_LEVEL_CACHE_MAX_SIZE =
intProperty("calcite.function.cache.maxSize", 0, v -> v >= 0);

/**
* The length of code before automatic method splitting is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name/description of the property isn't great. It doesn't include the units (characters, lines, number of statements). The javadoc doesn't say how the threshold is used - whether splitting occurs if a method is greater than or greater than or equal .

Maybe a better name can be found, I don't know.

But the doc should say something like 'A method is split if its size in characters is greater than or equal to the threshold and the threshold is non-negative.'

The field in ExpressionWriter needs better doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this to be MAX_METHOD_LENGTH_IN_CHARS_BEFORE_SPLITTING and clarified that splitting starts happening if the code is longer than the limit and not when equal to the limit. This is a long name though so I'm open to better names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of the constant to indicate no limit, I've changed this to be -1 instead of 0. I don't see a good place to hold the magic number though -- CalciteSystemProperty would be a reasonable user-facing location, but ExpressionWriter can't reference it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the -1 magic number a constant in Expressions

*
* <p>A value of <code>0</code>, the default, disables automatic method splitting.
*
* <p>Some queries can generate methods exceeding the JVM limit of 4000 characters per method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a citation for your statement "JVM limit of 4000 characters per method"?

Java 7 had a limit 65534 bytes (bytecode size). When did that change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this 4K from Flink's documentation on the configuration option relating to this:
https://github.com/apache/flink/blob/4101316ef37d3a7082bff57cb732f4e9e0349d09/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/config/TableConfigOptions.java#L225

So it's not exactly that the JVM has a limit of 4K, it's that the default JIT allows 8K only. Let me find a source for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's specified here in OpenJDK (other vendors use the same limit): https://github.com/openjdk/jdk11u-dev/blob/a754a3d8972abc67aee768d31a2dc2e8214274cf/src/hotspot/share/runtime/globals.hpp#L2391

It's considered a developer option though, rather than an end-user option, so documentation is limited.

* Use this setting to automatically detect and split methods larger than the specified
* limit.
*/
public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: proposal (to make the new property robust in case of negative values)

public static final CalciteSystemProperty<Integer> METHOD_SPLITTING_THRESHOLD =
      intProperty("calcite.linq.method_splitting_threshold", 0, v -> v >= 0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 0 is a valid method length, 0 doesn't seem to be a good value to indicate "don't split".

I think that any negative value should mean "don't split", and the default should be -1.

intProperty("calcite.linq.method_splitting_threshold", 0);

private static CalciteSystemProperty<Boolean> booleanProperty(String key,
boolean defaultValue) {
// Note that "" -> true (convenient for command-lines flags like '-Dflag')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,9 @@ static Scalar.Producer baz(ParameterExpression context_,
final ClassDeclaration classDeclaration =
Expressions.classDecl(Modifier.PUBLIC, "Buzz", null,
ImmutableList.of(Scalar.Producer.class), declarations);
String s = Expressions.toString(declarations, "\n", false);
String s =
Expressions.toString(declarations, "\n", false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declarations here usually do not look like ClassDeclarations either.

CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value());
if (CalciteSystemProperty.DEBUG.value()) {
Util.debugCode(System.out, s);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ private static String compile(RexBuilder rexBuilder, List<RexNode> constExps,
Expressions.methodDecl(Modifier.PUBLIC, Object[].class,
BuiltInMethod.FUNCTION1_APPLY.method.getName(),
ImmutableList.of(root0_), blockBuilder.toBlock());
String code = Expressions.toString(methodDecl);
String code =
Expressions.toString(methodDecl, CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value());
if (CalciteSystemProperty.DEBUG.value()) {
Util.debugCode(System.out, code);
}
Expand Down
1 change: 1 addition & 0 deletions gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ dropwizard-metrics.version=4.0.5
# do not upgrade this, new versions are Category X license.
elasticsearch.version=7.10.2
embedded-redis.version=0.6
flink-table-code-splitter.version=1.20.0
jts-core.version=1.19.0
jts-io-common.version=1.19.0
proj4j.version=1.2.2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,9 @@ static List<String> innodbFieldNames(final RelDataType rowType) {
InnodbMethod.INNODB_QUERYABLE_QUERY.method, fields,
selectFields, cond, ascOrder));
if (CalciteSystemProperty.DEBUG.value()) {
System.out.println("Innodb: " + Expressions.toString(enumerable));
System.out.println(
"Innodb: " + Expressions.toString(enumerable,
CalciteSystemProperty.METHOD_SPLITTING_THRESHOLD.value()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
list.add(Expressions.return_(null, enumerable));
return implementor.result(physType, list.toBlock());
Expand Down
1 change: 1 addition & 0 deletions linq4j/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ dependencies {

implementation("com.google.guava:guava")
implementation("org.apache.calcite.avatica:avatica-core")
implementation("org.apache.flink:flink-table-code-splitter")
}
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ public Type getType() {
}

@Override public String toString() {
ExpressionWriter writer = new ExpressionWriter(true);
// Use the JVM limit for method size (4K) as the method splitting threshold.
ExpressionWriter writer = new ExpressionWriter(true, 4000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you can't use the system property. But why should the default behavior be to split? I think it should be to not split.

If people want to split they should invoke by other means than toString().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

accept(writer, 0, 0);
return writer.toString();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
*/
package org.apache.calcite.linq4j.tree;

import org.apache.commons.lang3.StringUtils;
import org.apache.flink.table.codesplit.JavaCodeSplitter;

import org.checkerframework.checker.nullness.qual.Nullable;

import java.lang.reflect.Modifier;
Expand Down Expand Up @@ -46,19 +49,35 @@ public ClassDeclaration(int modifier, String name, @Nullable Type extended,
}

@Override public void accept(ExpressionWriter writer) {
// Conditionally serialize the class declaration directly to the supplied writer if method
// splitting is disabled or to a temporary writer that will generate code for the method that
// can be split before being serialized to the supplied writer.
final ExpressionWriter writerForClassWithoutSplitting = writer.usesMethodSplitting()
? writer.duplicateState() : writer;

String modifiers = Modifier.toString(modifier);
writer.append(modifiers);
writerForClassWithoutSplitting.append(modifiers);
if (!modifiers.isEmpty()) {
writer.append(' ');
writerForClassWithoutSplitting.append(' ');
}
writer.append(classClass).append(' ').append(name);
writerForClassWithoutSplitting.append(classClass).append(' ').append(name);
if (extended != null) {
writer.append(" extends ").append(extended);
writerForClassWithoutSplitting.append(" extends ").append(extended);
}
if (!implemented.isEmpty()) {
writer.list(" implements ", ", ", "", implemented);
writerForClassWithoutSplitting.list(" implements ", ", ", "", implemented);
}
writerForClassWithoutSplitting.list(" {\n", "", "}", memberDeclarations);

if (writer.usesMethodSplitting()) {
final int defaultMaxMembersGeneratedCode = 10000;

writer.append(
StringUtils.stripStart(
JavaCodeSplitter.split(writerForClassWithoutSplitting.toString(),
writer.getMethodSplittingThreshold(), defaultMaxMembersGeneratedCode),
" "));
}
writer.list(" {\n", "", "}", memberDeclarations);
writer.newlineAndIndent();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,25 @@ class ExpressionWriter {

private final Spacer spacer = new Spacer(0);
private final StringBuilder buf = new StringBuilder();
private boolean indentPending;
private final boolean generics;
private final int methodSplittingThreshold;
private boolean indentPending;

ExpressionWriter() {
this(true);
this(true, 0);
}

ExpressionWriter(boolean generics) {
ExpressionWriter(boolean generics, int methodSplittingThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need javadoc for this constructor.

You don't say the units of the threshold (bytes, lines), that it applies to source code length, what values disable splitting, and whether it's > or >=.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded comments and renamed this parameter.

this.generics = generics;
this.methodSplittingThreshold = methodSplittingThreshold;
}

public ExpressionWriter duplicateState() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public method needs javadoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this package-private rather than adding javadoc -- it really seems like most of the methods in ExpressionWriter can be package-private.

final ExpressionWriter writer =
new ExpressionWriter(this.generics, this.methodSplittingThreshold);
writer.indentPending = this.indentPending;
writer.spacer.add(this.spacer.get());
return writer;
}

public void write(Node expression) {
Expand Down Expand Up @@ -73,6 +83,14 @@ public boolean requireParentheses(Expression expression, int lprec,
return true;
}

public boolean usesMethodSplitting() {
return methodSplittingThreshold != 0;
}

public int getMethodSplittingThreshold() {
return methodSplittingThreshold;
}

/**
* Increases the indentation level.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,17 @@ private Expressions() {}
*/
public static String toString(List<? extends Node> expressions, String sep,
boolean generics) {
final ExpressionWriter writer = new ExpressionWriter(generics);
return toString(expressions, sep, generics, 0);
}

/**
* Converts a list of expressions to Java source code, optionally emitting
* extra type information in generics. Optionally splits the generated method if
* the method is too large.
*/
public static String toString(List<? extends Node> expressions, String sep,
boolean generics, int methodSplittingThreshold) {
final ExpressionWriter writer = new ExpressionWriter(generics, methodSplittingThreshold);
for (Node expression : expressions) {
writer.write(expression);
writer.append(sep);
Expand All @@ -71,7 +81,15 @@ public static String toString(List<? extends Node> expressions, String sep,
* Converts an expression to Java source code.
*/
public static String toString(Node expression) {
return toString(Collections.singletonList(expression), "", true);
return toString(expression, 0);
}

/**
* Converts an expression to Java source code.
* Optionally splits the generated method if the method is too large.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"too large" is vague. can do better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to be more specific about relating to the limit.

*/
public static String toString(Node expression, int methodSplittingTheshold) {
return toString(Collections.singletonList(expression), "", true, methodSplittingTheshold);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,24 +21,29 @@
import org.apache.calcite.linq4j.tree.BlockStatement;
import org.apache.calcite.linq4j.tree.Blocks;
import org.apache.calcite.linq4j.tree.ClassDeclaration;
import org.apache.calcite.linq4j.tree.ConditionalStatement;
import org.apache.calcite.linq4j.tree.DeclarationStatement;
import org.apache.calcite.linq4j.tree.Expression;
import org.apache.calcite.linq4j.tree.Expressions;
import org.apache.calcite.linq4j.tree.FieldDeclaration;
import org.apache.calcite.linq4j.tree.FunctionExpression;
import org.apache.calcite.linq4j.tree.MethodCallExpression;
import org.apache.calcite.linq4j.tree.MethodDeclaration;
import org.apache.calcite.linq4j.tree.NewExpression;
import org.apache.calcite.linq4j.tree.Node;
import org.apache.calcite.linq4j.tree.ParameterExpression;
import org.apache.calcite.linq4j.tree.Shuttle;
import org.apache.calcite.linq4j.tree.Types;

import org.apache.commons.lang3.StringUtils;

import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Sets;

import org.checkerframework.checker.nullness.qual.Nullable;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;

Expand Down Expand Up @@ -1719,6 +1724,60 @@ public void checkBlockBuilder(boolean optimizing, String expected) {
Expressions.toString(Expressions.constant(set)));
}

@Test void testLargeMethod() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test needs good javadoc.

What is MyClass and why is it commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the meaning of the MyClass comment block and added to the header public javadoc about the test + JIRA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that there is only one test in the whole of calcite that sues method-splitting.

Need a test where splitting is set to a low positive value, e.g. 3. Just to make sure that the splitter handles gracefully the case where it's impossible to split.

Is it possible to also use method-splitting in one or two of Calcite's system tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test described above and also added a test to ensure method splitting does not happen when methods remain under the limit.

I'm not familiar with Calcite's system tests. Would they be under the core module?

// public class MyClass {
// public void hugeMethod() {
// if (true) {
// // Long variable name declaration 1;
// } else if (false) {
// // Long variable name declaration 2;
// } else if (false) {
// // Long variable name declaration 3;
// } else if (false) {
// // Long variable name declaration 4;
// } else
// // Long variable name declaration 4;
// }
// }

// Generate long variable names.
final String longName = StringUtils.repeat('a', 5000);

DeclarationStatement longDecl =
Expressions.declare(0, longName, Expressions.constant(10));
final ConditionalStatement ifThenElse =
Expressions.ifThenElse(
Expressions.constant(true),
longDecl,
Expressions.constant(false),
longDecl,
Expressions.constant(false),
longDecl,
longDecl);

final MethodDeclaration hugeMethod =
Expressions.methodDecl(Modifier.PUBLIC,
Void.TYPE,
"hugeMethod",
Collections.emptyList(),
Blocks.toFunctionBlock(ifThenElse));

final ClassDeclaration classDecl =
new ClassDeclaration(Modifier.PUBLIC,
"MyClass",
null,
ImmutableList.of(),
ImmutableList.of(hugeMethod));

final String originalClass = Expressions.toString(classDecl);
final String classWithMethodSplitting = Expressions.toString(classDecl, 4000);

// Some local variables in hugeMethod have been extracted to be fields named local$<number> in
// the enclosing class definition.
Assertions.assertFalse(originalClass.contains("local$"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assertThat rather than assertFalse and assertTrue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Assertions.assertTrue(classWithMethodSplitting.contains("local$"));
}

/** An enum. */
enum MyEnum {
X,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,11 @@ public void setup() {
EnumerableRelImplementor relImplementor =
new EnumerableRelImplementor(plan.getCluster().getRexBuilder(), new HashMap<>());
info.classExpr = relImplementor.implementRoot(plan, EnumerableRel.Prefer.ARRAY);

// Set the method splitting threshold to the JVM limit of 4K.
info.javaCode =
Expressions.toString(info.classExpr.memberDeclarations, "\n", false);
Expressions.toString(info.classExpr.memberDeclarations,
"\n", false, 4000);
info.plan = plan;
planInfos[i] = info;
}
Expand Down
Loading