Skip to content

Commit

Permalink
SPARQL String. Unicode escapes exclude surrogates
Browse files Browse the repository at this point in the history
  • Loading branch information
afs committed Jan 30, 2025
1 parent 16f8a7a commit 18a4c82
Showing 1 changed file with 45 additions and 19 deletions.
64 changes: 45 additions & 19 deletions spec/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -10510,30 +10510,49 @@ <h4>Notes</h4>
<h2>SPARQL Grammar</h2>
<p>The SPARQL grammar covers both SPARQL Query and [[[SPARQL11-UPDATE]]].</p>
<section id="queryString">
<h3>SPARQL Request String</h3>
<h3>SPARQL String</h3>
<p>
A <dfn data-lt="SPARQLRequestString">SPARQL Request String</dfn> is
a <a>SPARQL Query String</a> or <a>SPARQL Update String</a> and is a Unicode character string
(c.f. section 6.1 String concepts of [[CHARMOD]]) in the language defined by the following
grammar.</p>
<span id="defn_SPARQLRequestString"></span>
A <dfn>SPARQL string</dfn> is an
<a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> that
conforms to the grammar given in this section.
</p>
<p class="note">
An <a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> is
a sequence of
<a data-cite="I18N-GLOSSARY#dfn-code-point" class="lint-ignore">Unicode code points</a>
which are <a data-cite="I18N-GLOSSARY#dfn-scalar-value" class="lint-ignore">Unicode scalar values</a>.
Unicode scalar values do not include the
<a data-cite="I18N-GLOSSARY#dfn-surrogate" class="lint-ignore">surrogate code points</a>.
</p>
<p>
A <dfn data-lt="SPARQLQueryString">SPARQL Query String</dfn> starts
at the <a href="#rQueryUnit">QueryUnit</a> production.</p>
<span id="defn_SPARQLQueryString"></span>
A <dfn>SPARQL query string</dfn> is a
<a>SPARQL string</a> that conforms to the grammar starting at
the <a href="#rQueryUnit">QueryUnit</a> production.
</p>
<p>
A <dfn data-lt="SPARQLUpdateString">SPARQL Update String</dfn> starts
at the <a href="#rUpdateUnit">UpdateUnit</a> production.</p>
<p>For compatibility with future versions of Unicode, the characters in this string may
<span id="defn_SPARQLUpdateString"></span>
A <dfn>SPARQL update string</dfn> is a
<a>SPARQL string</a> that conforms to the grammar starting at
the <a href="#rUpdateUnit">UpdateUnit</a> production.
</p>
<p>
For compatibility with future versions of Unicode, the characters in this string may
include Unicode codepoints that are unassigned as of the date of this publication (see
[[[UAX31]]] [[UAX31]] section 4 Pattern Syntax). For productions with excluded character
classes (for example <code>[^&lt;&gt;'{}|^`]</code>), the characters are excluded from the
range <code>#x0 - #x10FFFF</code>.</p>
range <code>#x0 - #x10FFFF</code>.
</p>
</section>

<section id="codepointEscape">
<h3>Codepoint Escape Sequences</h3>
<p>A SPARQL Query String is processed for codepoint escape sequences before parsing by the
<p>
A <a>SPARQL string</a> is processed for codepoint escape sequences before parsing by the
grammar defined in EBNF below. The codepoint escape sequences for a SPARQL query string
are:</p>
are:
</p>
<span class="doc-ref" id="table68"></span>
<table title="Codepoint escapes">
<colgroup>
Expand All @@ -10551,15 +10570,19 @@ <h3>Codepoint Escape Sequences</h3>
<a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
</td>
<td>A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the
encoded hexadecimal value.</td>
encoded hexadecimal value, excluding U+D800 to U+DFFF, the
<a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.
</td>
</tr>
<tr>
<td>
<span class="token">'\U'</span> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
<a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
</td>
<td>A Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the
encoded hexadecimal value.</td>
encoded hexadecimal value, excluding U+D800 to U+DFFF, the
<a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.

</tr>
</tbody>
</table>
Expand All @@ -10572,13 +10595,16 @@ <h3>Codepoint Escape Sequences</h3>
&lt;ab\u00E9xy&gt; # Codepoint 00E9 is Latin small e with acute - é
\u03B1:a # Codepoint x03B1 is Greek small alpha - α
a\u003Ab # a:b -- codepoint x3A is colon</pre>
<p>Codepoint escape sequences can appear anywhere in the query string. They are processed
<p>
Codepoint escape sequences can appear anywhere in the query string. They are processed
before parsing based on the grammar rules and so may be replaced by codepoints with
significance in the grammar, such as "<code>:</code>" marking a prefixed name.</p>
significance in the grammar, such as "<code>:</code>" marking a prefixed name.
</p>
<p>These escape sequences are not included in the grammar below. Only escape sequences for
characters that would be legal at that point in the grammar may be given. For example, the
variable "<code>?x\u0020y</code>" is not legal (<code>\u0020</code> is a space and is not
permitted in a variable name).</p>
permitted in a variable name).
</p>
</section>
<section id="whitespace">
<h3>White Space</h3>
Expand Down Expand Up @@ -10631,7 +10657,7 @@ <h3>Blank Nodes and Blank Node Identifiers</h3>
</p>
<p>
<a data-cite="RDF12-CONCEPTS#dfn-blank-node-identifier">Blank node identifiers</a>
are scoped to the <a>SPARQL Request String</a> in which they occur.
are scoped to the <a>SPARQL string</a> in which they occur.
Different uses of the same blank node identifier in a request
string refer to the same blank node. Fresh blank nodes are generated for each request;
blank nodes can not be referenced by identifier across requests.</p>
Expand Down

0 comments on commit 18a4c82

Please sign in to comment.