Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temp #444

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Temp #444

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import org.eclipse.rdf4j.common.concurrent.locks.Lock;
import org.eclipse.rdf4j.common.iteration.CloseableIteration;
import org.eclipse.rdf4j.common.iteration.ExceptionConvertingIteration;
import org.eclipse.rdf4j.common.transaction.IsolationLevels;
import org.eclipse.rdf4j.model.IRI;
import org.eclipse.rdf4j.model.Namespace;
import org.eclipse.rdf4j.model.Resource;
Expand Down Expand Up @@ -142,7 +143,7 @@ protected void notifyStatementRemoved(Statement st) {

@Override
public void begin() throws SailException {
logger.info("Begin connection transaction");
logger.debug("Begin connection transaction");

super.begin();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,12 @@ private class CombinedCardinalityCalculator extends CardinalityCalculator {
protected double getCardinality(StatementPattern sp) {
double hdtCard = endpointStoreEvaluationStatisticsHDT.getCardinality(sp);
double nativeCard = nativeEvaluationStatistics.getCardinality(sp);
if (hdtCard == Integer.MAX_VALUE && nativeCard > 0)
if (hdtCard == Integer.MAX_VALUE && nativeCard > 0) {
hdtCard = 0;
return hdtCard + nativeCard;
}
double cardinality = hdtCard + nativeCard;
System.out.println("Cardinality for " + sp.toString().replace("\n", " ") + " is " + cardinality + " (HDT: " + hdtCard + ", Native: " + nativeCard + ")\n");
return cardinality;
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
package com.the_qa_company.qendpoint;

import com.the_qa_company.qendpoint.compiler.CompiledSail;
import com.the_qa_company.qendpoint.compiler.SparqlRepository;
import com.the_qa_company.qendpoint.core.hdt.HDT;
import com.the_qa_company.qendpoint.core.hdt.HDTManager;
import com.the_qa_company.qendpoint.core.options.HDTOptions;
import com.the_qa_company.qendpoint.core.options.HDTOptionsKeys;
import com.the_qa_company.qendpoint.core.triples.TripleString;
import com.the_qa_company.qendpoint.store.EndpointFiles;
import com.the_qa_company.qendpoint.store.Utility;
import org.apache.commons.lang3.time.StopWatch;
import org.eclipse.rdf4j.model.Statement;
import org.eclipse.rdf4j.query.TupleQuery;
import org.eclipse.rdf4j.query.TupleQueryResult;
import org.eclipse.rdf4j.query.explanation.Explanation;
import org.eclipse.rdf4j.repository.sail.SailRepositoryConnection;
import org.junit.After;
import org.junit.Before;
import org.junit.*;
import org.junit.rules.TemporaryFolder;

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Iterator;
import java.util.Objects;
import java.util.stream.Stream;

public class TempTest {

@Rule
public TemporaryFolder tempDir = TemporaryFolder.builder().assureDeletion().build();
private SparqlRepository repository;


@Before
public void setupRepo() throws IOException {
Path root = tempDir.newFolder().toPath();
ClassLoader loader = getClass().getClassLoader();
String filename = "2018_complete.nt";

Path hdtstore = root.resolve("hdt-store");
Path locationNative = root.resolve("native");

Files.createDirectories(hdtstore);
Files.createDirectories(locationNative);

String indexName = "index.hdt";

HDTOptions options = HDTOptions.of(
// disable the default index (to use the custom indexes)
HDTOptionsKeys.BITMAPTRIPLES_INDEX_NO_FOQ, true,
// set the custom indexes we want
HDTOptionsKeys.BITMAPTRIPLES_INDEX_OTHERS, "sop,ops,osp,pso,pos");


try (HDT hdt = HDTManager.generateHDT(new Iterator<>() {
@Override
public boolean hasNext() {
return false;
}

@Override
public TripleString next() {
return null;
}
}, Utility.EXAMPLE_NAMESPACE, options, null)) {
hdt.saveToHDT(hdtstore.resolve(indexName).toAbsolutePath().toString(), null);
} catch (Error | RuntimeException e) {
throw e;
} catch (Exception e) {
throw new RuntimeException(e);
}

repository = CompiledSail.compiler().withEndpointFiles(new EndpointFiles(locationNative, hdtstore, indexName))
.compileToSparqlRepository();
try (InputStream is = new BufferedInputStream(Objects.requireNonNull(loader.getResourceAsStream(filename),
filename + " doesn't exist"))) {
repository.loadFile(is, filename);
}
}
Comment on lines +40 to +84
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ate47 I'm using this code to set up a repo with the file from #413

The query statistics seems to always be returning 0. Do you know why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail this a bit more. I loaded the dataset and I was querying one triple pattern and I see that the cardinality is non zero:
Screenshot 2024-01-22 at 22 52 51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a break point on that line and then debugging the TempTest test() I added.

Copy link
Contributor Author

@hmottestad hmottestad Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add the file with the data to this path: qendpoint-store/src/test/resources/2018_complete.nt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a race condition because a merge was triggered in the middle of the file loading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how I can wait for the merge to complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. In my PR eclipse-rdf4j/rdf4j#4879 I managed to get the QueryJoinOptimizer to work better when the statistics are inaccurately returning 0.0 by adding 5.0 to the cost of all operations and also penalising cartesian joins more heavily.


@After
public void after() {
if (repository != null) {
repository.shutDown();
}
repository = null;
}

@Test
public void test() {
try (SailRepositoryConnection connection = repository.getConnection()) {
System.out.println();
String query = """
PREFIX epo: <http://data.europa.eu/a4g/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX legal: <https://www.w3.org/ns/legal#>
PREFIX dcterms: <http://purl.org/dc/terms#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?countryID ?year (COUNT(DISTINCT ?lot) AS ?amountLots) (SUM(if(?bidders = 1, 1, 0)) AS ?numSingleBidders) WHERE {

?proc a epo:Procedure .
?proc epo:hasProcedureType ?p .
?proc epo:hasProcurementScopeDividedIntoLot ?lot .

?stat epo:concernsSubmissionsForLot ?lot .

?stat a epo:SubmissionStatisticalInformation .
?stat epo:hasReceivedTenders ?bidders .

?resultnotice epo:refersToProcedure ?proc .
?resultnotice a epo:ResultNotice .
?resultnotice epo:hasDispatchDate ?ddate .

FILTER ( ?p != <http://publications.europa.eu/resource/authority/procurement-procedure-type/neg-wo-call>)
BIND(year(xsd:dateTime(?ddate)) AS ?year) .

# This line should be above the FILTER and BIND, but doing so causes the QueryJoinOptimizer to not optimize the query for some unknown reason
?resultnotice epo:refersToRole ?buyerrole .
Comment on lines +124 to +125
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D063520 @ate47

This line should be above the FILTER and BIND. When I put it above then the QueryJoinOptimizer doesn't manage to optimize the query at all. When I move it back down again like I've done now then it does end up triggering the QueryJoinOptimizer seemingly correctly, but it only seems to check the cardinality of this single statement pattern here and when it does it gets 0.0 as the cardinality.

If you run the test you should see the following printed:

Cardinality for StatementPattern    Var (name=resultnotice)    Var (name=_const_6aa9a9c_uri, value=http://data.europa.eu/a4g/ontology#refersToRole, anonymous)    Var (name=buyerrole)  is 0.0 (HDT: 0.0, Native: 0.0)

Copy link
Contributor Author

@hmottestad hmottestad Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the QueryJoinOptimizer code and it doesn't seem to recurse down into the query tree. Essentially it will recurse down to the first Join and then optimize the join arguments that are either instance of Join or LeftJoin or StatementPattern. Removing the inner sub-select from the query "fixes" the QueryJoinOptimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the QueryJoinOptimizer eclipse-rdf4j/rdf4j#4879



{
SELECT DISTINCT ?buyerrole ?countryID WHERE {
?org epo:hasBuyerType ?buytype .
FILTER (?buytype != <http://publications.europa.eu/resource/authority/buyer-legal-type/eu-int-org> )

?buyerrole epo:playedBy ?org .
?org legal:registeredAddress ?orgaddress .
?orgaddress epo:hasCountryCode ?countrycode .
?countrycode dc:identifier ?countryID .

}
}
} GROUP BY ?countryID ?year


""";

System.out.println(query);

Explanation explanation = runQuery(connection, query);
System.out.println();
System.out.println();
System.out.println();
System.out.println();
System.out.println(explanation.toDot());
System.out.println();
System.out.println();
System.out.println();
System.out.println();
System.out.println(explanation);
System.out.println();
System.out.println();
System.out.println();
System.out.println();

}

}




private static Explanation runQuery(SailRepositoryConnection connection, String query) {
StopWatch stopWatch = StopWatch.createStarted();
{
TupleQuery tupleQuery = connection.prepareTupleQuery(query);
try (TupleQueryResult evaluate = tupleQuery.evaluate()) {
long count = evaluate.stream().count();
System.out.println(count);
}
}
TupleQuery tupleQuery = connection.prepareTupleQuery(query);
tupleQuery.setMaxExecutionTime(60*10);
Explanation explain = tupleQuery.explain(Explanation.Level.Timed);
// System.out.println(explain);
// System.out.println();
System.out.println("Took: " + stopWatch.formatTime());

return explain;

}
}
123 changes: 123 additions & 0 deletions qendpoint-store/src/test/resources/queryplan.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
Distinct (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.001ms)
Projection (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.0ms)
├── ProjectionElemList
│ ProjectionElem "countryID"
│ ProjectionElem "year"
│ ProjectionElem "amountLots"
│ ProjectionElem "numSingleBidders"
└── Extension (resultSizeActual=0, totalTimeActual=5.0s)
Group (countryID, year) (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=2.45ms)
Join (HashJoinIteration) (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.015ms)
╠══ Extension (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=1.47ms) [left]
║ ├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=0.041ms)
║ │ ╠══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=0.088ms) [left]
║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=0.117ms) [left]
║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=0.096ms) [left]
║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=1.9s) [left]
║ │ ║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=1.9M, totalTimeActual=1.8s, selfTimeActual=144ms) [left]
║ │ ║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=2.24ms, selfTimeActual=0.129ms) [left]
║ │ ║ │ ║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=8, totalTimeActual=1.84ms, selfTimeActual=0.473ms) [left]
║ │ ║ │ ║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=10, totalTimeActual=0.45ms, selfTimeActual=0.203ms) [left]
║ │ ║ │ ║ │ ║ │ ║ │ ╠══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.106ms, selfTimeActual=0.106ms) [left]
║ │ ║ │ ║ │ ║ │ ║ │ ║ s: Var (name=resultnotice)
║ │ ║ │ ║ │ ║ │ ║ │ ║ p: Var (name=_const_6aa9a9c_uri, value=http://data.europa.eu/a4g/ontology#refersToRole, anonymous)
║ │ ║ │ ║ │ ║ │ ║ │ ║ o: Var (name=buyerrole)
║ │ ║ │ ║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=10, totalTimeActual=0.141ms, selfTimeActual=0.141ms) [right]
║ │ ║ │ ║ │ ║ │ ║ │ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ ║ │ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ ║ │ ║ │ ║ │ o: Var (name=_const_be18ee7b_uri, value=http://data.europa.eu/a4g/ontology#Procedure, anonymous)
║ │ ║ │ ║ │ ║ │ ║ └── Filter (resultSizeActual=8, totalTimeActual=0.918ms, selfTimeActual=0.727ms) [right]
║ │ ║ │ ║ │ ║ │ ║ ╠══ Compare (!=)
║ │ ║ │ ║ │ ║ │ ║ ║ Var (name=p)
║ │ ║ │ ║ │ ║ │ ║ ║ ValueConstant (value=http://publications.europa.eu/resource/authority/procurement-procedure-type/neg-wo-call)
║ │ ║ │ ║ │ ║ │ ║ ╚══ StatementPattern [index: SPO] (resultSizeActual=9, totalTimeActual=0.191ms, selfTimeActual=0.191ms)
║ │ ║ │ ║ │ ║ │ ║ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ ║ p: Var (name=_const_9c756f6b_uri, value=http://data.europa.eu/a4g/ontology#hasProcedureType, anonymous)
║ │ ║ │ ║ │ ║ │ ║ o: Var (name=p)
║ │ ║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.268ms, selfTimeActual=0.268ms) [right]
║ │ ║ │ ║ │ ║ │ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ p: Var (name=_const_9c3f1eec_uri, value=http://data.europa.eu/a4g/ontology#hasProcurementScopeDividedIntoLot, anonymous)
║ │ ║ │ ║ │ ║ │ o: Var (name=lot)
║ │ ║ │ ║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=1.9M, totalTimeActual=1.6s, selfTimeActual=1.6s) [right]
║ │ ║ │ ║ │ ║ s: Var (name=stat)
║ │ ║ │ ║ │ ║ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ ║ │ ║ o: Var (name=_const_ea79e75_uri, value=http://data.europa.eu/a4g/ontology#SubmissionStatisticalInformation, anonymous)
║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=1.2s, selfTimeActual=1.2s) [right]
║ │ ║ │ ║ │ s: Var (name=stat)
║ │ ║ │ ║ │ p: Var (name=_const_25686184_uri, value=http://data.europa.eu/a4g/ontology#concernsSubmissionsForLot, anonymous)
║ │ ║ │ ║ │ o: Var (name=lot)
║ │ ║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.242ms, selfTimeActual=0.242ms) [right]
║ │ ║ │ ║ s: Var (name=stat)
║ │ ║ │ ║ p: Var (name=_const_98c73a3c_uri, value=http://data.europa.eu/a4g/ontology#hasReceivedTenders, anonymous)
║ │ ║ │ ║ o: Var (name=bidders)
║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.205ms, selfTimeActual=0.205ms) [right]
║ │ ║ │ s: Var (name=resultnotice)
║ │ ║ │ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ o: Var (name=_const_77e914ad_uri, value=http://data.europa.eu/a4g/ontology#ResultNotice, anonymous)
║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.068ms, selfTimeActual=0.068ms) [right]
║ │ ║ s: Var (name=resultnotice)
║ │ ║ p: Var (name=_const_183bd06d_uri, value=http://data.europa.eu/a4g/ontology#refersToProcedure, anonymous)
║ │ ║ o: Var (name=proc)
║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.061ms, selfTimeActual=0.061ms) [right]
║ │ s: Var (name=resultnotice)
║ │ p: Var (name=_const_1b0b00ca_uri, value=http://data.europa.eu/a4g/ontology#hasDispatchDate, anonymous)
║ │ o: Var (name=ddate)
║ └── ExtensionElem (year)
║ FunctionCall (http://www.w3.org/2005/xpath-functions#year-from-dateTime)
║ FunctionCall (http://www.w3.org/2001/XMLSchema#dateTime)
║ Var (name=ddate)
╚══ Distinct (new scope) (resultSizeActual=1, totalTimeActual=1.02ms, selfTimeActual=0.655ms) [right]
Projection (resultSizeActual=1, totalTimeActual=0.365ms, selfTimeActual=0.025ms)
╠══ ProjectionElemList
║ ProjectionElem "buyerrole"
║ ProjectionElem "countryID"
╚══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.34ms, selfTimeActual=0.083ms)
├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.227ms, selfTimeActual=0.026ms) [left]
│ ╠══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.179ms, selfTimeActual=0.064ms) [left]
│ ║ ├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.074ms, selfTimeActual=0.047ms) [left]
│ ║ │ ╠══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.011ms, selfTimeActual=0.011ms) [left]
│ ║ │ ║ s: Var (name=buyerrole)
│ ║ │ ║ p: Var (name=_const_beb855c2_uri, value=http://data.europa.eu/a4g/ontology#playedBy, anonymous)
│ ║ │ ║ o: Var (name=org)
│ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.016ms, selfTimeActual=0.016ms) [right]
│ ║ │ s: Var (name=org)
│ ║ │ p: Var (name=_const_beb18915_uri, value=https://www.w3.org/ns/legal#registeredAddress, anonymous)
│ ║ │ o: Var (name=orgaddress)
│ ║ └── StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.041ms, selfTimeActual=0.041ms) [right]
│ ║ s: Var (name=orgaddress)
│ ║ p: Var (name=_const_2f7de0e1_uri, value=http://data.europa.eu/a4g/ontology#hasCountryCode, anonymous)
│ ║ o: Var (name=countrycode)
│ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.022ms, selfTimeActual=0.022ms) [right]
│ s: Var (name=countrycode)
│ p: Var (name=_const_a825a5f4_uri, value=http://purl.org/dc/elements/1.1/identifier, anonymous)
│ o: Var (name=countryID)
└── Filter (resultSizeActual=1, totalTimeActual=0.03ms, selfTimeActual=0.009ms) [right]
╠══ Compare (!=)
║ Var (name=buytype)
║ ValueConstant (value=http://publications.europa.eu/resource/authority/buyer-legal-type/eu-int-org)
╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.022ms, selfTimeActual=0.022ms)
s: Var (name=org)
p: Var (name=_const_1abd8d4b_uri, value=http://data.europa.eu/a4g/ontology#hasBuyerType, anonymous)
o: Var (name=buytype)
GroupElem (amountLots)
Count (Distinct)
Var (name=lot)
GroupElem (numSingleBidders)
Sum
If
Compare (=)
Var (name=bidders)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="0"^^<http://www.w3.org/2001/XMLSchema#integer>)
ExtensionElem (amountLots)
Count (Distinct)
Var (name=lot)
ExtensionElem (numSingleBidders)
Sum
If
Compare (=)
Var (name=bidders)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="0"^^<http://www.w3.org/2001/XMLSchema#integer>)
Loading
Loading