How to efficiently find paths between nodes? #1752
Replies: 5 comments 2 replies
-
@SvenPVoigt QLever has a basic path search feature, which is documented at https://github.com/ad-freiburg/qlever/blob/master/docs/path_search.md . However, as of now, it computes all paths between the given sources and targets. Currently, the only way to restrict this is by specifying an upper bound on the number of paths per target. There is no way to compute the shortest path yet. Which, I think, is what you want here. Here is a query that finds one citation path between your given papers (which happens to be a rather long one):
https://qlever.cs.uni-freiburg.de/dblp/yKqluo If you are interested in this, how about contributing to the codebase and implementing the shortest path search? All the infrastructure is there already. All that is needed is an implementation of Dijkstra's algorithm on a given graph. |
Beta Was this translation helpful? Give feedback.
-
PS: There should be a direct predicate |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response @hannahbast! Hard to believe that there is a reference chain of 6000+ papers that connects the two! Citation KG quirksBut also, I did in fact mean the order that https://w3id.org/oc/meta/br/06502647013 is in the reference history of https://w3id.org/oc/meta/br/06503267503 even though it was published 12 years later! The path that connects them is about 4000 papers long! So it turns out we have a cycle in the citation history which seems illogical. However, a review of citations shows there is quite a few papers that are recorded as having referenced papers from years published after they were published. PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX cito: <http://purl.org/spar/cito/>
SELECT ?c ?c_year ?r ?r_year WHERE {
?c dblp:omid/^cito:hasCitingEntity/cito:hasCitedEntity/^dblp:omid ?r .
?c dblp:yearOfPublication ?c_year .
?r dblp:yearOfPublication ?r_year .
FILTER ( ?r_year > ?c_year )
}
LIMIT 100 Use Case ObjectivesAs for my personal use case, I was looking at simplifying and accelerating my paper research process 😅. This would involve collecting a tree of citations to find notable papers in the reference history of a paper. I would then hopefully accelerate finding original works that help explain concepts not covered in the current work! This tree doesn't need to be too big (maybe 10 papers deep) so I was ideally hoping to look at a short history and then return paths to top cited works in that history so I could trace the development of important concepts and catch up on changes per paper. QLever Next StepsGiven that there is noise in the dataset, it might be useful to add a way for finding the n shortest paths. However, I still need a way for finding the citation history first. If I were to implement anything, I would prefer to first add some sort of limits to the variable path length search feature in SPARQL. For example, regex style limits for finding paths between 1 and 10 edges long would look like: Do you know of any standards you would want to follow if I were to try and implement this feature? Are you already working on something similar in a branch? Next priority (or maybe even better?) would be to update the service Thanks! |
Beta Was this translation helpful? Give feedback.
-
Regarding the variable path length feature, QLever internally already supports that and in an older version of QLever, this could be specified in a SPARQL query using a syntax very similar to (or even identical with) the syntax you propose. However, for the sake of SPARQL conformity, the current code only use the internal implementation for {0, ∞} ( https://github.com/ad-freiburg/qlever/blob/master/src/engine/QueryPlanner.cpp#L852-L859 It would be relatively straightforward to extend the SPARQL grammar to support this (see https://github.com/ad-freiburg/qlever/blob/master/src/parser/sparqlParser/generated/README.md) and make the corresponding changes in the code. Extending the PathSearch feature would indeed be relatively straightforward. Here is the code for the configuration: https://github.com/ad-freiburg/qlever/blob/master/src/engine/PathSearch.h#L91-L108 . @joka921 Any objections or additions? @SvenPVoigt How familiar are you with C++? |
Beta Was this translation helpful? Give feedback.
-
@SvenPVoigt Great! I would recommend to start with the |
Beta Was this translation helpful? Give feedback.
-
Hi I am using qlever and dblp to search for a paper's citation genealogy (a tree-like dag structure of references). I have reduced the complexity of the task by searching only for the most cited papers in a genealogy, however, I am running into efficiency issues when trying to find the path between the most cited paper and the root node. I further reduced the problem to only find the first steps in each path. However, this is still too costly of an operation to succeed on the dblp instance of qlever.
(query that times out)
The only workaround I have found is to first get all the referenced papers of the root (# references usually << #citations) and then iterate over the references and use ASK queries to find out if the first step is in the path.
This still seems pretty costly. Are there any efficiency improvements I am missing? (dblp does not have FILTER EXISTS enabled for qlever)
Beta Was this translation helpful? Give feedback.
All reactions