-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of triangle counting #2
Labels
Comments
Check out DUOLION paper! Seems to fit very well here
Am 18.09.2017 2:56 nachm. schrieb "Michael Röder" <[email protected]
…:
Task description
Lemming already implements the counting of triangles in a given graph as a
possible metric. At the moment, the implemented algorithm iterates over the
vertices of the graph and tries to find all triangles in which the current
vertex is part of. This is done by iterating over the edges that are
starting or ending at the current vertex (the direction of edges is not
interesting for counting triangles) and checking whether the other two
vertices two edges have are connected via a third edge. The algorithm does
not take triangles into account that involve a vertex that has been seen
before (i.e., that has a lower ID than the current vertex) to make sure
that every triangle is counted only once. The implementation can be found
at https://github.com/dice-group/Lemming/blob/master/src/main/
java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#
L41-L75.
Although the implementation is already parallelizing the problem, its
complexity grows too fast for dense graphs. We would like to have
additional implementations for counting triangles and a logic, that chooses
a triangle counting approached based on the input graph (e.g., based on its
density).
Possible approaches
http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an
overview of different approaches. Our current approach is the edge-count
and forward is an enhancement of this approach.
1. forward from http://i11www.iti.kit.edu/
extra/publications/sw-fclt-05_t.pdf
<http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf>
2. matrix-multiplication is a different approach. It is based on the
adjacency matrix A of the given graph and follows an algorithm like
http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/
<http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/>.
Unfortunately, the matrix might be very big and space limitations might
become a problem for large graphs.
3. ayz might be a very good choice. However, for this approach the
matrix-multiplication has to be implemented at first.
4. In https://www.cs.cmu.edu/~glmiller/Publications/Papers/
Tsourakakis-KMF-09.pdf the authors propose a sampling based on that
the triangles can be estimated faster without loosing too much accuracy.
This would fit to our already existing implementation.
Testing
Apart from testing the correctness (with some delta for approaches that do
not reach 100% correctness, the simple JUnit test class
<https://github.com/dice-group/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetricTest.java>
could be rused), the runtime have been checked as well. When commenting
out all other metrics
<https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java#L53-L59>,
only adding the triangle counting and executing the RefinementTest
<https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java>
class it should be finished in a reasonable amount of time.
Forking / Merging
The solution should be implemented in a branch, forked from the master
branch of the project. The pull request should merge the solution back into
the master branch.
Responsible contact
Michael Röder <http://cs.uni-paderborn.de/ds/team/team/>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQpKf8KFmQ68vVUJQxdrwVAsHG2D0trWks5sjmhpgaJpZM4Pa3U3>
.
|
That is the fourth point in the description 😉 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Task description
Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#L41-L75.
Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density).
The project is mainly based on the Grph library.
Possible approaches
http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the
edge-count
andforward
is an enhancement of this approach.forward
from http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdfmatrix-multiplication
is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs.ayz
might be a very good choice. However, for this approach thematrix-multiplication
has to be implemented at first.Testing
Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class could be rused), the runtime have been checked as well. When commenting out all other metrics, only adding the triangle counting and executing the RefinementTest class it should be finished in a reasonable amount of time. For testing, the semantic web dog food data is necessary. It is provided at http://hobbitdata.informatik.uni-leipzig.de/lemming/SemanticWebDogFood-until-2015.zip
Forking / Merging
The solution should be implemented in a branch, forked from the
master
branch of the project. The pull request should merge the solution back into the master branch.Responsible contact
Michael Röder
The text was updated successfully, but these errors were encountered: