-
Notifications
You must be signed in to change notification settings - Fork 0
QueryTaskService
The query task service provides a task-based REST API to specify and
execute rich queries against indexed documents, both on the node its
running, and across all nodes in a specified node group. The service is
created through the service task factory (i.e., /core/query-tasks
).
The underlying query engine is currently powered by LUCENE. This tutorial gives a good introduction on the types of queries. We also support all kinds of non string queries (e.g., numerical, geospatial data structures, fuzzy searches)
Lucene uses MUST, MUST_NOT, SHOULD, and SHOULD_NOT as operators. This blog post gives some good background on understanding those operators.
/core/query-tasks
/core/query-tasks/<task-id>
/core/local-query-tasks
/core/local-query-tasks/<task-id>
The local query tasks are not load balanced and replicated across nodes. They are the appropriate target for broadcast requests, allowing the concurrent execution of independent queries on a per-node basis.
The query task behavior is entirely driven by a query specification and a set of properties that govern the result set:
public static class QuerySpecification {
public enum QueryOption {
/**
* Query results are updated in real time, by using {@code QueryFilter} instance on the index.
* Any update that satisfies the query filter will cause the results to be updated and a self
* PATCH to be sent on the service.
*/
CONTINUOUS,
/**
* Query results will return the number of documents that satisfy the query and populate the
* the {@link results.documentCount} field. The results will not contain links or documents
*/
COUNT,
/**
* The query will execute on the current view of the index, potentially missing recent updates.
* This improves performance but does not guarantee latest results.
*/
DO_NOT_REFRESH,
/**
* Query results will include the state documents in the {@link results.documents} collection
*/
EXPAND_CONTENT,
/**
* The query will execute over all document versions, not just the latest per self link. Each
* document self link will be annotated with the version
*/
INCLUDE_ALL_VERSIONS,
/**
* Query results will include document versions marked deleted
*/
INCLUDE_DELETED,
/**
* Query results will be sorted by the specified sort field
*/
SORT,
/**
* Infrastructure use only. Query originated from a query task service
*/
TASK,
}
public enum SortOrder {
ASC, DESC
}
/*
* Query definition
*/
public Query query = new Query();
public QueryTerm sortTerm;
public SortOrder sortOrder;
/**
* The optional resultLimit field is used to enable query results pagination. When
* resultLimit is set, the query task will not return any results when finished, but will
* include a nextPageLink field. A client can then issue a GET request on the nextPageLink
* to get the first page of results. A nextPageLink field will be included in GET response
* documents until all query results have been consumed.
*/
public Integer resultLimit;
/**
* The optional expectedResultCount field will enable query retries until
* expectedResultCount is met or the QueryTask expires. taskInfo.stage will remain in the
* STARTED phase until such time.
*/
public Long expectedResultCount;
public EnumSet<QueryOption> options = EnumSet.noneOf(QueryOption.class);
public ServiceOption targetIndex;
The CONTINUOUS option creates a long running query filter that process all updates to the local index. The query specification is compiled into an efficient query filter that evaluates the document updates, and the filter evaluates to true, the query task is PATCHed with a document results reflecting the self link (and document if EXPAND is set) that changed.
The continuous query task acts as a node wide black board, or notification service allowing clients or services to receive notifications without having to subscribe to potentially millions of discrete services
To create a new query task, one needs to send a POST (including a body
with a query specification) to /core/query-tasks
. Of course, one can
use the per-service UI available through /core/query-tasks/ui
and cut
and paste from the template state and click POST.
As a very simple example, let's query all services that include any
content in the message
field, including all previous versions (this
requires having option PERSISTENCE to true
).
TODO: Populate the document store
{
"taskInfo": {
"isDirect": true
},
"querySpec": {
"options": [
"INCLUDE_ALL_VERSIONS"
],
"query": {
"term": {
"matchType": "WILDCARD",
"matchValue": "*",
"propertyName": "message"
}
}
}
}
Setting isDirect
to true
means that the results will be sent back
synchronously -- this is easy for interactive experimentation (we
discuss asynchronous queries later). Instead of wildcard, one can use
TERM (for exact matching) or PHRASE (for proximity matching) for
matchType. In terms of options, one can use:
- EXPAND_CONTENT: the query results will include the entire document description in addition to the default documentLinks to these documents.
- INCLUDE_DELETED: will include the documents that have either expired or have been marked as deleted
- TASK: TODO
- COUNT: will return the total number of documents and no links to them
- INCLUDE_ALL_VERSIONS: will include results from a version that is either created after the query was started, or, is the latest version
- SORT: will sort the results based on querySpec.sortOrder and querySpec.sortTerm.
In the Java Universe, this would be written as follows:
QueryTask.QuerySpecification q = new QueryTask.QuerySpecification();
q.query.setTermPropertyName("message").setTermMatchValue("*");
QueryTask task = QueryTask.create(q).setDirect(true);
// get results using `task.results.documentLinks`
POST body that will attempt to find a service instance of
kind ExampleServiceState
and with a field name
set to value
query-target
. This is achieved using boolean clause: we match a
particular service kind, as well as term content (in this case, a
PHRASE).
{
"querySpec": {
"query": {
"booleanClauses": [
{
"term": {
"matchValue": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"propertyName": "documentKind"
},
"occurance": "MUST_OCCUR"
},
{
"term": {
"matchType": "PHRASE",
"matchValue": "query-target",
"propertyName": "name"
},
"occurance": "MUST_OCCUR"
}
]
}
}
}
This is a booleanClause query (i.e., for instance, not a simple,
top-level query). occurance
can be MUST_OCCUR (equivalent of AND --
default), MUST_NOT_OCCUR (equivalent of NOT), SHOULD_OCCUR (equivalent
of OR).
Note that we are not using isDirect this time (reminder: this defines if your query will return results synchronously or asynchronously -- by default the queries are asynchronous and isDirect = false).
A client can create a new instance of an existing query task, by sending a POST to the query task factory, with a simple body, setting the documentSourceLink to the path of an existing query task. The factory will then get the state of the existing task and create a new query task with it as the initial state. This essentially re-creates the task and runs it.
Returns the contents of task. If the task is finished, or in progressed, the result collection will be populated with the document links that satisfied the query
A GET on the factory, returns all tasks.
On a specific task:
$ curl http://localhost:8000/core/query-tasks/2de69ad0-08a3-48fa-947a-f50b2e006806
{
"documentExpirationTimeMicros": 1413223137730000,
"documentUpdateTimeMicros": 1413223107740000,
"documentSelfLink": "/core/query-tasks/2de69ad0-08a3-48fa-947a-f50b2e006806",
"documentKind": "com:vmware:xenon:services:common:QueryTask",
"documentVersion": 2,
"includeDeletedDocuments": false,
"results": {
"documentExpirationTimeMicros": 0,
"documentUpdateTimeMicros": 0,
"documentVersion": 0,
"documents": null,
"documentLinks": [
"/core/examples/18a2f2eb-6cbb-4ae8-9c15-2cb75d84608b"
]
},
"querySpec": {
"query": {
"booleanClauses": [
{
"term": {
"matchValue": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"propertyName": "documentKind"
},
"occurance": "MUST_OCCUR"
},
{
"term": {
"matchType": "PHRASE",
"matchValue": "query-target",
"propertyName": "name"
},
"occurance": "MUST_OCCUR"
}
],
"occurance": "MUST_OCCUR"
}
},
"taskInfo": {
"isDirect": false,
"stage": "FINISHED"
}
}
{
"querySpec": {
"query": {
"booleanClauses": [
{
"term": {
"matchValue": "com:vmware:xenon:services:common:QueryValidationTestService:QueryValidationServiceState",
"propertyName": "documentKind"
},
"occurance": "MUST_OCCUR"
},
{
"term": {
"matchType": "TERM",
"matchValue": "decentralized",
"propertyName": "stringValue"
},
"occurance": "MUST_OCCUR"
}
]
}
}
}
{
"querySpec": {
"query": {
"booleanClauses": [
{
"booleanClauses": [
{
"term": {
"matchValue": "com:vmware:xenon:services:common:QueryValidationTestService:QueryValidationServiceState",
"propertyName": "documentKind"
},
"occurance": "MUST_OCCUR"
},
{
"term": {
"matchType": "TERM",
"matchValue": "decentralized",
"propertyName": "stringValue"
},
"occurance": "MUST_OCCUR"
}
]
},
{
"booleanClauses": [
{
"term": {
"matchValue": "com:vmware:xenon:services:common:QueryValidationTestService:QueryValidationServiceState",
"propertyName": "documentKind"
},
"occurance": "MUST_OCCUR"
},
{
"term": {
"range": {
"percisionStep": "4",
"isMaxInclusive": "false",
"isMinInclusive": "false",
"max": "123.21",
"min": "123.2",
"type": "DOUBLE"
},
"matchType": "TERM",
"propertyName": "doubleValue"
},
"occurance": "MUST_OCCUR"
}
]
}
]
}
}
}
{
"querySpec": {
"query": {
"occurance": "MUST_OCCUR",
"term": {
"propertyName": "longValue",
"range": {
"type": "LONG",
"min": "10.0",
"max": "990.0",
"isMinInclusive": "false",
"isMaxInclusive": "false",
"percisionStep": "2147483647"
}
}
}
}
}
{
"querySpec": {
"query": {
"occurance": "MUST_OCCUR",
"term": {
"propertyName": "doubleValue",
"range": {
"type": "DOUBLE",
"min": "123.0",
"max": "173.0",
"isMinInclusive": "true",
"isMaxInclusive": "false",
"percisionStep": "2147483647"
}
}
}
}
}
The following query sorts the documents of kind ExampleServiceState based on the property name
.
sortOrder
specifies ASC/DESC order.
{
"taskInfo": {
"isDirect": true
},
"querySpec": {
"options": [
"SORT", "EXPAND_CONTENT"
],
"sortTerm" : {
"propertyType": "STRING",
"propertyName": "name"
},
"sortOrder" : "DESC",
"query": {
"term": {
"matchType": "TERM",
"matchValue": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"propertyName": "documentKind"
}
}
}
}
note: In order for a property to be enabled for sorting, it must have indexingOption SORT
enabled.
please see the programming model page and
example service page for more information.
The document index can also be queried using a subset of the OData $filter
specification. This is to aid in querying interactively without the use of tools like dcpc
to form querySpec
documents.
To use OData $filter
based queries, issue a GET
on /core/odata-queries?$filter(...)
with the appropriate filter parameters. This will result in a direct query task with equivalent an querySpec
. The DIRECT
query has the EXPAND
option so results will be embedded in the results.
The odata-queries
service supports only a subset of the OData URI specification. The following are the only currently supported query verbs.
Operator | Description | Example |
---|---|---|
Logical Operators | ||
eq | Equal | documentSelfLink eq /core/examples/a703f44a-ab33-4451-87b2-6554160ed1e0 |
ne | Not equal | name ne 'London' |
gt | Greater than | counter gt 20 |
ge | Greater than or equal | counter ge 10 |
lt | Less than | price lt 20 |
le | Less than or equal | price le 100 |
and | Logical and | Price le 200 and Price gt 3.5 |
or | Logical or | name eq instance-1 or counter gt 200 |
Grouping Operators | ||
( ) | Precedence grouping | (counter gt 200 and counter lt 100) or name eq instance-150 |
Query for all documents with the property name name
has the value instance-1
.
$ http "http://localhost:8000/core/odata-queries?\$filter(name eq instance-1)"
HTTP/1.1 200 OK
{
...
"indexLink": "/core/document-index",
"querySpec": {
"options": [
"EXPAND_CONTENT"
],
"query": {
"occurance": "MUST_OCCUR",
"term": {
"matchType": "TERM",
"matchValue": "instance-1",
"propertyName": "name"
}
},
"resultLimit": 2147483647
},
"results": {
...
"documentLinks": [
"/core/examples/a703f44a-ab33-4451-87b2-6554160ed1e0"
],
...
"documents": {
"/core/examples/a703f44a-ab33-4451-87b2-6554160ed1e0": {
...
},
},
"taskInfo": {
"isDirect": true,
"stage": "FINISHED"
}
}
The resultLimit
field is used to enable query results pagination. By
default a query will return all results in a single document. When
resultLimit
is set, the query task will not return any results when
finished, but will include a nextPageLink
field. A client can then
issue a GET request on the nextPageLink
to get the first page of
results.
The documentKind
is the same as when creating the task:
com:vmware:xenon:services:common:QueryTask
. In the case of a GET
response, the taskInfo.stage
will either be FINISHED or FAILED, there
is no need to poll even if isDirect
was false
when creating
the query task. A nextPageLink
field will be included in GET
response documents until all query results have been consumed. The
nextPageLink
services inherit the original querySpec
, including
documentExpirationTimeMicros
. The documentCount
field will be set to
the total number of hits and the prevPageLink
field will be set for
navigating to the previous page of results.
For the example, first create a set of example documents:
$ for i in {1..2000}; do dcpc post /core/examples --name "doc-${i}"; done
Next, create a query task with resultLimit
set to 100:
$ dcpc <<EOF
action: post
path: /core/query-tasks
body:
taskInfo:
isDirect: true
querySpec:
resultLimit: 100
options:
- EXPAND_CONTENT
query:
occurance: MUST_OCCUR
booleanClauses:
- occurance: MUST_OCCUR
term:
propertyName: documentKind
matchValue: com:vmware:xenon:services:common:ExampleService:ExampleServiceState
- occurance: MUST_OCCUR
term:
propertyName: name
matchValue: 'doc-*'
matchType: WILDCARD
EOF
The response will have zero documentLinks
and nextPageLink
set to
the URI of the first page:
{
"taskInfo": {
"stage": "FINISHED",
"isDirect": true
},
"querySpec": {
"query": {
"occurance": "MUST_OCCUR",
"booleanClauses": [
{
"occurance": "MUST_OCCUR",
"term": {
"propertyName": "documentKind",
"matchValue": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"matchType": "TERM"
}
},
{
"occurance": "MUST_OCCUR",
"term": {
"propertyName": "name",
"matchValue": "doc-*",
"matchType": "WILDCARD"
}
}
]
},
"resultLimit": 100,
"options": [
"EXPAND_CONTENT"
]
},
"results": {
"documentLinks": [],
"nextPageLink": "/277da2e8-f2e4-4ad5-9a8a-ae173f47d853",
"documentVersion": 0,
"documentUpdateTimeMicros": 0,
"documentExpirationTimeMicros": 0,
"documentOwner": "168c9af7-193d-4b6c-92c5-409df5b27752"
},
"indexLink": "/core/document-index",
"documentVersion": 0,
"documentEpoch": 0,
"documentKind": "com:vmware:xenon:services:common:QueryTask",
"documentSelfLink": "/core/query-tasks/166cd49e-05d8-4e81-ac86-dbca73cc5ad4",
"documentUpdateTimeMicros": 1432747351265007,
"documentExpirationTimeMicros": 1432747951265019,
"documentOwner": "168c9af7-193d-4b6c-92c5-409df5b27752"
}
To see the first page of query results, send a GET request to the nextPageLink
:
$ dcpc get /277da2e8-f2e4-4ad5-9a8a-ae173f47d853
The response will have 100 documentLinks and nextPageLink set to the URI of the second page:
{
"taskInfo": {
"stage": "FINISHED",
"isDirect": true
},
"querySpec": {
"query": {
"occurance": "MUST_OCCUR"
},
"resultLimit": 100,
"options": [
"EXPAND_CONTENT"
]
},
"results": {
"documentLinks": [
"/core/examples/a67c4d98-793b-4cd6-b77b-6bc657e228fa",
"/core/examples/001fcebe-d669-49a8-9685-88b5c3b644cf",
"..."
],
"documents": {
"/core/examples/a67c4d98-793b-4cd6-b77b-6bc657e228fa": {
"keyValues": {},
"name": "doc-13",
"documentVersion": 0,
"documentEpoch": 0,
"documentKind": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"documentSelfLink": "/core/examples/a67c4d98-793b-4cd6-b77b-6bc657e228fa",
"documentSignature": "5e522c487980b647896b6502d8dcaef21e3b6c92",
"documentUpdateTimeMicros": 1432746726377012,
"documentExpirationTimeMicros": 0,
"documentOwner": "168c9af7-193d-4b6c-92c5-409df5b27752"
},
"/core/examples/001fcebe-d669-49a8-9685-88b5c3b644cf": {
"keyValues": {},
"name": "doc-69",
"documentVersion": 0,
"documentEpoch": 0,
"documentKind": "com:vmware:xenon:services:common:ExampleService:ExampleServiceState",
"documentSelfLink": "/core/examples/001fcebe-d669-49a8-9685-88b5c3b644cf",
"documentSignature": "f62ca3a6baadddf8ee0d05c8aaac142f51b2ae33",
"documentUpdateTimeMicros": 1432746727319012,
"documentExpirationTimeMicros": 0,
"documentOwner": "168c9af7-193d-4b6c-92c5-409df5b27752"
},
"...": {
"...": "..."
}
},
"nextPageLink": "/6cd352b5-c5f2-4321-aaf0-1a8d14db6af2",
"documentVersion": 0,
"documentUpdateTimeMicros": 0,
"documentExpirationTimeMicros": 0,
"documentOwner": "168c9af7-193d-4b6c-92c5-409df5b27752"
},
"indexLink": "/core/document-index",
"documentVersion": 0,
"documentKind": "com:vmware:xenon:services:common:QueryTask",
"documentSelfLink": "/277da2e8-f2e4-4ad5-9a8a-ae173f47d853",
"documentUpdateTimeMicros": 0,
"documentExpirationTimeMicros": 1432748187234995
}
The following example will GET the given link, save the results to a file and follow nextPagelink if set. First, paste this function into a bash shell:
$ function getQueryPages() {
set -e
link=$1
nextLink=$(dcpc get $link | jq -r .results | tee $(basename $link).json | jq -r .nextPageLink)
if [ "$nextLink" != "null" ]; then
getQueryPages $nextLink
fi
}
Example run:
$ getQueryPages /277da2e8-f2e4-4ad5-9a8a-ae173f47d853
$ ls -1 *.json
0d5e0c18-fa38-4ccb-9910-75d19e25e87a.json
0e71afb5-8166-423b-822d-e70d6354721f.json
108f66a1-3fb8-40fe-905e-204fa36afec3.json
...
% jq .documentLinks[] < 0d5e0c18-fa38-4ccb-9910-75d19e25e87a.json | wc -l
100
When querying a factory or using a QueryTask that filter on the documentKind the resulting list contains
service documents of the same kind. To process such results with a minimal amount of code you can use QueryResultsProcessor
class. It
offers the same view over a QueryTask
, ServiceDocumentQueryResults
results and even an Operation
whose body is either of those types:
QueryTask task = ...
Operation op = Operation.createPost(...)
.setBody(task)
.setCompletion((o, e) -> {
if (e != null) {
handleError(e);
return;
}
// let the processor convert the body from the operation
QueryResultsProcessor processor = QueryResultsProcessor.create(o);
// processor will convert every result to desired type if needed
for(MyDocumentType doc: processor.documents(MyDocumentType.class)) {
// linked documents will also be converted to the desired type
OtherDocumentType linked = processor.linkedDocument(doc.someLink, OtherDocumentType.class);
handleDocuments(doc, linked);
}
});