- Short Name: prefer
- SEP Number: SEP-0005
- Author: Jerven Bolleman
Jerven Bolleman
RDF and SPARQL have excellent support for multiple languages and this is widely used. However, there is a lack of selecting a preferred language given a set of possibilities.
While technically possible in SPARQL 1.1. We would like to have an easier to write query syntax that is potentially easier to optimize for query engines. Even if the final syntax can be rewritten to SPARQL 1.1 equivalent constructs.
Given the requirement to select labels/properties in languages in an order of user preference we currently need to use the lang() and langmatches() functions either in a series of optional blocks with tests for boundness, or a subselect with an ordering.
SELECT
?thing
?label
WHERE {
OPTIONAL {
?thing rdfs:label ?label .
FILTER(langMatches(lang(?label), “fr”)
}
OPTIONAL {
FILTER(! BOUND(?label))
?thing rdfs:label ?label .
FILTER(langMatches(lang(?label), “en”)
}
FILTER(BOUND(?label) && BOUND(?thing))
}
Gives an example of the use with the optional solution.
SELECT
?thing
?label
WHERE
{ {
SELECT
?thing
?label
?order
WHERE
{ VALUES ( ?lang ?order ) {
( "fr" 1 )
( "en" 2 )}
?thing rdfs:label ?label
FILTER langMatches(lang(?label), ?lang)
}
GROUP BY ?thing ?label ?order
ORDER BY ?order
LIMIT 1
}
}
Neither variety are easy to write, and neither can take into account preferential settings regarding languages from a browser or http header.
The proposal is to introduce a new keyword PREFER
and functionality in the syntax and abstract algebra.
The same query as above would look like this in the proposed syntax.
SELECT ?label
WHERE
{
[] rdfs:label PREFER(?label FROM langMatches(lang(?label), "FR"),
langMatches(lang(?label), "EN"),
langMatches(lang(?label), "NL")) .
}
We will also introduce a second new keyword ACCEPTLANG
, to refer to taking the order from the accept-lang http header provided by the client and a new shorthand for testing if a value has a matching language string hasLang(?label, "EN")
. In situations where there is no http header such as comand lines, the tools should look in places such as the locale settings on Linux, or other fields where a language preference by the user of the system might be indicated.
SELECT
?label
WHERE
{
[] rdfs:label PREFER(?label FROM hasLang(?label, "FR"),
hasLang(?label, "EN"),
hasLang(?label, "NL"),
hasLang(?label, ACCEPTLANG)) .
}
PREFERLANG is a special shorthand to make it easier to find strings with a useable for the user language tag. It follows the same execution rules as PREFER
but accepts only strings for language codes and the ACCEPTLANG
keyword.
SELECT
?label
WHERE
{
[] rdfs:label PREFERLANG(?label FROM "FR", "EN", "NL", ACCEPTLANG)) .
}
The PREFER
keyword introduces a new complexity for query engines to implement. While it does not introduce new complexity in terms of query expressiveness, the practical impact on the query engines is not small. Specifically it introduces a new manner to introduce an ad-hoc sorted join and reduction. On the other hand, PREFER
is a general solution that works on any effective boolean value providing function. For example one can write
SELECT
?label
WHERE
{
[] rdfs:label PREFER(?label FROM "cafe au lait"@FR == ?label, "cappuccino"@EN == ?label, "koffie verkeerd"@NL == ?labe, strlen(?label)<20) .
}
The word FROM
is selected because it indicates that assignment is in the unusual direction.
A not addressed concern is where would PREFER
be allowed? Only in the object position or would we allow it in the subject or predicate of the BGPs.
The use of a PREFERLANG or PREFER operation does not change the data type of the returned values.
SELECT
?label (lang(?label) AS ?languageOfLabel)
WHERE
{
[] rdfs:label PREFERLANG(?label, "FR", "EN", "NL", ACCEPTLANG)) .
}
In this case the ?languageOfLabel
will contain the language of the selected variable.
There is a potential for query engines to optimise this kind of query if they can already push down filters on values lower into the stack.
None
Existing implementations do not understand this logic.