Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of triangle counting #2

Open
MichaelRoeder opened this issue Sep 18, 2017 · 2 comments
Open

Improve speed of triangle counting #2

MichaelRoeder opened this issue Sep 18, 2017 · 2 comments
Labels

Comments

@MichaelRoeder
Copy link
Member

MichaelRoeder commented Sep 18, 2017

Task description

Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#L41-L75.

Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density).

The project is mainly based on the Grph library.

Possible approaches

http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the edge-count and forward is an enhancement of this approach.

  1. forward from http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf
  2. matrix-multiplication is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs.
  3. ayz might be a very good choice. However, for this approach the matrix-multiplication has to be implemented at first.
  4. In https://www.cs.cmu.edu/~glmiller/Publications/Papers/Tsourakakis-KMF-09.pdf the authors propose a sampling based on that the triangles can be estimated faster without loosing too much accuracy. This would fit to our already existing implementation.

Testing

Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class could be rused), the runtime have been checked as well. When commenting out all other metrics, only adding the triangle counting and executing the RefinementTest class it should be finished in a reasonable amount of time. For testing, the semantic web dog food data is necessary. It is provided at http://hobbitdata.informatik.uni-leipzig.de/lemming/SemanticWebDogFood-until-2015.zip

Forking / Merging

The solution should be implemented in a branch, forked from the master branch of the project. The pull request should merge the solution back into the master branch.

Responsible contact

Michael Röder

@MichaHoffmann
Copy link

MichaHoffmann commented Sep 18, 2017 via email

@MichaelRoeder
Copy link
Member Author

That is the fourth point in the description 😉
But thanks for the hint 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants