-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging code from Dev wordsi into master #19
base: master
Are you sure you want to change the base?
Conversation
…into dev-wordsi
…gorithm and condenses itself with HAC
… Spectral Clustering
…ltered bigram matrix, Updating similarity to compile, Fixing some minor bugs in consensus clustering
…javadoc. Also update some scala code with comments.
* | ||
* @author Keith Stevens | ||
*/ | ||
public abstract class AbstractGraphClustering implements Clustering { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this might be a pain, but since we have a graph package, could we somehow retrofit this to use a Graph instance and be in the graph package?
Alternatively, we could wrap a Matrix as a Graph and implement it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not change this to use the graph interface for two reasons:
- It currently implements the Clustering interface, which is really nice and portable
- This provides an alternative implementation with slightly different guarantees as far as internal data structure
If we want cross compatibility, we could easily write wrappers between graphs and matrices, but that should be in a separate change.
Overall, the changes look good. I have a few concerns where the graph package and some of the matrix-as-graph classes overlap in functionality. It would be good to present a common theme for those. |
I still need to go through and fix the javadocs and the minor suggestions, but I put in my thoughts on some of the larger suggestions. |
Most of the javadoc and simple cleaning changes fixed. There's still a few more to be handled. |
Dev-Wordsi contains a multitude of small fixes and a bulk of new code centered around consensus clustering. This also makes a significant change to the Assignments interface to make it easier and less cumbersome to use.
The new Consensus Clustering code relies on a new object called a partition, which is an easy to use representation of a clustering solution that includes some addiition helper methods not useful to general clustering Assignments. The Consensus merging algorithms use a comparison function between partitions and a combining algorithm to generate an improved Partition.
In addition to this major update, there are smaller bug fixes and minor refactorings needed within the consensus clustering code.