-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computational optmization #10
Comments
Hey |
Hi joandre, akaltsikis approached me also regarding the MCL implementation over Spark.
This implementation has some problems:
That means that with many blocks (default block size is 1024) – in our case 50M/1024 ~= 50K, the simulate multiply will effectively never finish or will generate 50K*16GB ~= 1000TB of data. On the other hand if we use bigger block size e.g. 100k we get OutOfMemoryException in the “toDesne” method of the multiply. We have worked around that by implementing our-selves both the Sparse multiplication and addition in a very naïve way – but at least it works. Ohad. |
Amazing job Ohad, That is exactly what I wanted to do for month without finding the time! No I confirm I have not tried. I was waiting to do what you have done, a complete inventory on big graphs. Actually the last issue I found talking about MCL [SPARK-5832] was which algorithm to choose between AP and MCL. I think indeed it is time to propose an implementation of MCL since it is widely used (especially in bioinformatics). One last point, I was also thinking about using GraphX library instead of Spark matrices to implement it. So I could compare performance of both approaches. That remained an idea in the back of my mind. Joan |
Funny that you say that, |
Hi Ohad, Just to warn I have published the repo as a spark package (MCL). Joan |
great news! On 15 August 2016 at 22:07, joandre [email protected] wrote:
|
Hi Joan, I changed only the input creation of the program: Is the input already too large for the algorithm to process? |
@icklerly I am gonna post a version of the MCL algorithm for Apache Spark which runs for BIG and Sparse Graphs ( much bigger than your dataset) in the following days so please keep an eye to my profile. |
Hi all, Nice to hear akaltsikis. Indeed, my implementation suffers of uzadude remark above (Spark uses dense block matrices by default). I will publish a more workable version soon. @uzadude: have you finally try to push some commits on Spark on that topic? Joan |
Great! I will add a link to your repo in the README file. |
@uzadude would you be ok to propose a pull-request so we can introduce your implementation in this version? Otherwise I will do pretty much the same job that you did twice. |
Hi guys! |
Hi @uzadude, |
@akaltsikis |
@icklerly I ran that in my shitty laptop and it took 511 Seconds, Finished at 33 Iterations and outputted 68930 Clusters. :-) |
@akaltsikis |
Hi all, |
Hi @icklerly, Sorry I missed your message. As akaltsikis said, he will publish a new version. It is in progress. No release date scheduled right now. |
The text was updated successfully, but these errors were encountered: