Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering by gYear #32

Open
jstcki opened this issue Oct 29, 2019 · 10 comments
Open

Filtering by gYear #32

jstcki opened this issue Oct 29, 2019 · 10 comments

Comments

@jstcki
Copy link
Contributor

jstcki commented Oct 29, 2019

Hi @vhf!

A follow-up to our discussion about filtering by years (and probably other date types). While this works fine for non-literal dimensions:

query.filter(dim.equals("http://some.iri.org/foo") //or
query.filter(dim.in(["http://some.iri.org/foo", "http://some.iri.org/bar"])

… it doesn't for xsd:gYear values. For example:

query.filter(dim.equals("2018"))

generates a SPARQL query with

FILTER(?someField = 2018 )

… which leads to no results (see http://yasgui.org/short/VyGPg76bB).

Expected outcome

The expected (working) SPARQL should be

FILTER(?someField = "2018"^^xsd:gYear )

(see http://yasgui.org/short/7QS7K16Nl)

Solution?

So, I think there needs to be a way to provide a typed value to .filter() – but how? I tried this:

query.filter(dim.equals({
  type: "literal",
  datatype:"http://www.w3.org/2001/XMLSchema#gYear",
  value: "2017"
}))

… but that just generates an error in the library:

Error: into() cannot cast arg {“value”:“2017",“type”:“literal”,“datatype”:“http://www.w3.org/2001/XMLSchema#gYear“}
@jstcki
Copy link
Contributor Author

jstcki commented Oct 29, 2019

I played some more with the SPARQL query and quite oddly, these all return the expected results (notice the use of >=, with = none of these work I think):

(?xField >= "2017"^^xsd:date ) // Returns values for 2017 and 2018
(?xField >= "2017"^^xsd:dateTime ) // Returns values for 2017 and 2018
(?xField >= "2017"^^xsd:foobar ) // <=== WHAT? Also returns values for 2017 and 2018

but I'm really not sure why?

Also, these do not work:

(?xField >= "2017" ) // Only returns 2018 values
(?xField >= "2017"^^xsd:whatever ) // <=== WHAT? Only returns 2018 values

@vhf
Copy link
Contributor

vhf commented Oct 30, 2019

Solution?

So, I think there needs to be a way to provide a typed value to .filter() – but how? I tried this:

Here's how, and I understand it would have been hard to come up with it, it's quite rdfjs specific:

You have to construct a literal to compare to: year.equals(literal("2017", namedNode("http://www.w3.org/2001/XMLSchema#gYear"))

Full example:

  const entryPoint = new DataCubeEntryPoint(
    "https://trifid-lindas.test.cluster.ldbar.ch/query",
    { languages: ["de"] },
  );
  const dataCubes = await entryPoint.dataCubes();

  const cube = dataCubes[0];
  const dimensions = await cube.dimensions();
  const year = dimensions[0];
  const forstzone = dimensions[1];
  const kanton = dimensions[2];

  const results = await cube.query()
    .select({
      year,
      forstzone,
      kanton,
    })
    .filter(year.equals(literal("2017", namedNode("http://www.w3.org/2001/XMLSchema#gYear"))))
    .execute();

You should be able to import literal and namedNode because the lib depends on @rdfjs/data-model, try this:

import { literal, namedNode } from "@rdfjs/data-model";

@jstcki
Copy link
Contributor Author

jstcki commented Oct 30, 2019

Thanks, I'll give it a shot.

jstcki added a commit to visualize-admin/visualization-tool that referenced this issue Oct 30, 2019
@jstcki
Copy link
Contributor Author

jstcki commented Oct 30, 2019

OK, it works. But … It's quite brittle IMO.

  1. We can't just rely on extraMetadata.scaleOfMeasure because even if we know it's a temporal dimension, we don't know which data type is used for dimension values.
  2. In order to construct the right data type, I would need to infer it from one of the dimension's min/max values, which is metadata fetched asynchronously, so I'd have to cache this info myself somewhere which is very tedious.
  3. Ideally, every Dimension would also have a (non-optional) value data type (extraMetadata.scaleOfMeasure doesn't work because it doesn't tell us the exact type we'd have to use for filtering – also it's kinda optional)
  4. Even better, if the value data type was known by the dimension, year.equals("2017") could just work and constructing typed literals would be handled by the library itself. I think this would be a good approach from an API perspective.

P.S. Maybe even this could work: year.equals(new Date(2018, 0, 1)) because if the value type for the year dimension is gYear it could then just use date.getFullYear() internally. But probably this is not really too useful.

@jstcki
Copy link
Contributor Author

jstcki commented Oct 30, 2019

FYI, I'm going to implement 2. from my previous comment because it's actually useful to know dimension values/minmax in a few places, not just for filtering. So no hurry for something like 4. (which I still think is nicer 😉 ).

@ktk
Copy link
Member

ktk commented Oct 30, 2019

We definitely need to invest time into the representation of time in the cube model but that will have to wait for next year I'm afraid.

As a workaround, could we use the datatype for the literal that we have in the shape @vhf ? It is not directly attached to the current qb:MeasureProperty|qb:DimensionProperty but I might add that as I did with the scaleOfMeasure as a workaround. Like this the lib would simply attach it properly to the FILTER and then the generated SPARQL should IMO be correct.

@vhf
Copy link
Contributor

vhf commented Oct 30, 2019

As a workaround, could we use the datatype for the literal that we have in the shape @vhf ?

Unfortunately not. The best we can do at the moment is a workaround like scaleOfMeasure.

@ktk
Copy link
Member

ktk commented Oct 30, 2019

Yes that's what I mean so I could add it as rdfs:range to the Dimension/Measure.

@jstcki
Copy link
Contributor Author

jstcki commented Jan 28, 2020

@ktk I noticed that while filtering by xsd:gYear works fine, filtering with xsd:gYearMonth does not.

E.g. FILTER(?dim0 = "2006-09"^^xsd:gYearMonth). doesn't work.

This works: (?dim0 >= "2006-09"^^xsd:gYearMonth) && (?dim0 <= "2006-09"^^xsd:gYearMonth)

I suspect this is some particularity of SPARQL itself or Stardog's implementation but I wonder if there is a good way to handle these issues.

Example query

@jstcki
Copy link
Contributor Author

jstcki commented Jan 28, 2020

Oddly, this works too: (?dim0 IN ("2006-06"^^xsd:gYearMonth)), so as a workaround I'm switching all filters to use IN instead of using =.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants