-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging code from Dev wordsi into master #19
Open
fozziethebeat
wants to merge
142
commits into
master
Choose a base branch
from
dev-wordsi
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
142 commits
Select commit
Hold shift + click to select a range
560a2b4
Fixing merge issue
fozziethebeat 01d4ec6
Merge branch 'master' into dev-wordsi
fozziethebeat f9f800b
Merge commit
fozziethebeat 7347b4d
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat fe9b5bf
Updating type parameter in SerializableUtil
fozziethebeat 47c8f69
Fixing the RI wordsi generator and adding a new one
fozziethebeat a00ff41
Adding a new RI based context generator to handle dependency parse co…
fozziethebeat 9797949
Fixing a compile bug
fozziethebeat 86a73b9
Adding run scripts used for doing the topic coherence evaluation
fozziethebeat 5a650d6
Adding a new streaming k-means implementation that runs as a batch al…
fozziethebeat 3a368a1
Fixing a major bug in TransformStatistics
fozziethebeat 0541fd5
Fixing bugs and updating run scripts
fozziethebeat 8cf2954
Adding a file existance check to reductionEval
fozziethebeat bf63c1f
Adding a filtered basis mapping and a matrix that represents a single…
fozziethebeat 5b7a993
Merging with master
fozziethebeat d295b5d
Removing Assignment to clean up the Assignments interface
fozziethebeat c3d2334
Merging with remote
fozziethebeat 85b313b
Adding a base class to graph clustering algorithms
fozziethebeat 2289085
Fixing abug in Assignments
fozziethebeat a0ae12a
Fixing abstract graph clustering to only cluster the found clusters f…
fozziethebeat d36fdc8
Adding two graph clusterigns: one wrapping HAC and one for Normalized…
fozziethebeat d813ca5
Making a handful of handy fixes
fozziethebeat 257f7bf
Making some minor bug fixes to k-means and svd
fozziethebeat a623605
Updat VSM
fozziethebeat e04cd7d
Adding a ton of scala code. Most important of which is the AMI code
fozziethebeat 07ed4ce
Adding two scala scripts, adding a sspace for making a chi squared fi…
fozziethebeat 1c7aa9d
Adding a simple page rank based clustering method
fozziethebeat bd2dfb8
Update some matrix transforms to be more composition and have better …
fozziethebeat 3fbdd32
Updating the artifact id so that pushing it does not overwrite the st…
fozziethebeat fe08ec5
Updating PageRankClusteirng and renaming a few files
fozziethebeat cfb6d8f
Fix the pom and build file to correctly deploy to sonatype
fozziethebeat d08c078
Fixing some bugs within similarity and adding handy methods
fozziethebeat c3cd7b4
Merging with the upstream
fozziethebeat 23e029e
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat e4f1b31
Adding in the start of an implementation for G-Means
fozziethebeat 0d284c9
Adding a new style of clustering algorithms and cleaning up some code
fozziethebeat e79bfa6
making some minor changes
fozziethebeat 82043cb
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat c58b5a7
Adding a minor fix to coals
fozziethebeat 16e2cb5
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat d9b2a70
Adding a new morphological analysis based wordsi
fozziethebeat d9809f4
Adding a fast upgma agglomerative clustering implementation. Updated…
fozziethebeat cf6ac67
Rewriting the boem script to do all four consensus methods
fozziethebeat e5a67d9
Fixing an ordering bug in average link HAC
fozziethebeat 438911f
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 78ef48e
Removing some unused scripts, renaming some things, adding documentation
fozziethebeat 0cb3171
updating boem with a faster, better implementation
fozziethebeat 6adecac
Improving performance of the nearest neighbor chain agglomerative clu…
fozziethebeat bf32d26
Adding javadoc for the new agglomerative clustering algorithm
fozziethebeat c68eae9
Added different link methods to the neighbor chain HAC and moving cod…
fozziethebeat 7a13c57
Adding a re-implementation of the supervised SemEval scorer
fozziethebeat 18bbfb1
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 0209906
Removing old hadoop code
fozziethebeat 48d44d8
Working on gmeans still
fozziethebeat cb5d68d
Updating some of the scala files
fozziethebeat 04c784f
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 997ea9d
Merge branch 'master' into dev-wordsi
fozziethebeat 6dcbb0e
Merging with master
fozziethebeat b0cc161
Fixing test cases based on updates to SimpleDependencyTreeNode implem…
fozziethebeat 8bef074
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 92a0f7f
Cleaning and refactoring
fozziethebeat 208f629
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 49133d4
new file: scala/DisambiguateGraphSemEval.scala
fozziethebeat 59ac275
Updating some scala code
fozziethebeat 86ee2fe
Minor updates to bigram extraction
fozziethebeat 996a88a
Merging with upstream
fozziethebeat 9699a10
Adding a new interface for matrix writing.
fozziethebeat 49208d1
modified: scala/ExtractTop10Terms.scala
fozziethebeat 92e49b1
Adding extra tests around Partitions and adding the ability to create…
fozziethebeat 0944f9d
Adding an interface around LAPACK and using this for eigen decomposit…
fozziethebeat 065491c
Adding logging to a few classes
fozziethebeat 5f6f7ee
Merging with upstream
fozziethebeat 4811072
Fixing a small bug in Normalized Spectral Clustering when creating th…
fozziethebeat 53b51ec
Moving the logging comments around
fozziethebeat 4b4892e
Making merge fix
fozziethebeat 1efbd5f
Retrofitting the existing global transform implementations to use the…
2be376e
adjusting the scoring methods to handle imcomplete partitions
fozziethebeat 59f4437
Updating Schisel a little to take in a stop word list and print out t…
fozziethebeat ebb2b79
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat 6d872be
Adding an implementation of the modified bessel function
fozziethebeat 839111f
Adding a few stop words
fozziethebeat 90c9c6a
Adding some useful methods for reading sparse matrices
fozziethebeat 4a114ca
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat a9c4717
Fixing Schisel so that it doesn't write the lda matrics to disk paifully
fozziethebeat d59cd2d
Applying some fixes to Schisel to clean thing sup
fozziethebeat ecafcf2
merging schisel changes with upstream
fozziethebeat 4ba910b
adding a pre-mapped alphabet to Schisel
fozziethebeat 24576ff
Updating Schisel with a pre-allocated alphabet
fozziethebeat 8e40c3e
Separating some functions for a cleaner interface
fozziethebeat 06875ec
Adding some handy vector io methods
fozziethebeat 4fc2928
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat 9a3fabb
Adding more stop words
fozziethebeat 665655b
Merging with upstream
fozziethebeat 3b1febe
Adding a branch for an experiment on Word Similarity tests
fozziethebeat 9771ce1
Fixing the compare prototype class and adding a plotting file
fozziethebeat 3544eb6
Setting a limit on the number of items hac can cluster
fozziethebeat 4e08aba
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat e0808f7
Adding scala code and scripts for running evaluations variousover th…
fozziethebeat 7385bdb
Adding files for running oand code for running jobs in parallel and f…
fozziethebeat 511d3cd
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 64ffb6e
Adding a new file to extract dependency paths from focus words to the…
fozziethebeat f1106b0
Adding scripts to run topic modelling, a scream file for doing the co…
fozziethebeat e2b76d0
Changing the agglomerative method to assume all partitions have seen …
fozziethebeat 12f1b3c
Changing the clustering code so that it only considers the first 2500…
fozziethebeat 7001799
merging with master by accident
fozziethebeat 5db341d
Trying out a new evaluation experiment for SemEval data
fozziethebeat 12b4049
Adding some new evaluation codes
fozziethebeat 626283b
Checking in edits to a file that don't matter
fozziethebeat a01b04f
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat 0dacdb7
Cleaning up a lot of crap code that has been moved to better more sta…
fozziethebeat 647406f
Adding two files to disambiguate a corpus using the many prototypes l…
fozziethebeat a07db6b
Cleaning up the argument passing method, playing around a bit with su…
fozziethebeat f073500
Fixing a minor bug in the word space reading code and adding logging …
fozziethebeat be2cd51
Adding a function to change the weight in similarity functions for pa…
fozziethebeat 4d64fd9
Simplifying output format of the particle filter, updating the weight…
fozziethebeat 1a4aaac
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat b5ba4f8
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat b8859b6
Making minor fixes for code to work in the run pipeline. Fixing the …
fozziethebeat 40139f9
Total re-formulation of the depot. Adding code to merge day splits, …
fozziethebeat 18daf2a
refactoring some of the python code to do some heavier cleaning of th…
fozziethebeat b0e8997
Removing the graph code, fixing object counter's max and min calls, a…
fozziethebeat 8836163
Adding a module for filtering non-english tweets
fozziethebeat 85211b6
merging from the origin. Fixing a bug in the day splitter so that sp…
fozziethebeat 8489b5e
Major refactoring to handle the phase graph summarization method
fozziethebeat 98489a4
Heavily re-writing the phrase graph code to compute a minimal finite …
fozziethebeat b30ed1d
Adding code to print the phrase graph in a simple json format
fozziethebeat de8185a
Adding scripts to query mongodb, Updating the phrase graph code so th…
fozziethebeat e09d13e
Making some minor updates
fozziethebeat 02e2dbe
Adding node.js javascript code and shell code to scrape the event tim…
fozziethebeat 0bf94ab
Merge branch 'dev-wordsi' of github.com:fozziethebeat/S-Space into de…
fozziethebeat ce0cd49
Updating the javascript extractor to handle even more ghastly formatt…
fozziethebeat 18f6799
Printing a json file for each sport, swapping start and end times to …
fozziethebeat 395ad55
A new particle filter for computing change points. It needs to be te…
fozziethebeat ae63a95
All kinds of commits
fozziethebeat 0173f7f
Merge branch 'dev-wordsi' of https://github.com/fozziethebeat/S-Space…
fozziethebeat 914b9bb
Cleaning up the directory structure heavily and adding missing files …
fozziethebeat c735cd5
Moving expiermental directories to another separate repository to kee…
fozziethebeat b32f6eb
Merge branch 'master' into dev-wordsi
fozziethebeat f04fbe9
Adding the graph code back in
fozziethebeat 2037599
Adding class level documentation to each file being added
fozziethebeat 99d7373
Adding license statements to each file and adding some javadoc where …
fozziethebeat ebd0b65
mostly adding javadocs and handling minor fix suggestions made in the…
fozziethebeat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
target/* | ||
*/target* | ||
.manager | ||
*.jar | ||
*.txt | ||
*.dat | ||
*.png | ||
.*.swp | ||
*.class | ||
target | ||
*~ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63 changes: 63 additions & 0 deletions
63
src/main/java/edu/ucla/sspace/basis/FilteredStringBasisMapping.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
/* | ||
* Copyright (c) 2011, Lawrence Livermore National Security, LLC. Produced at | ||
* the Lawrence Livermore National Laboratory. Written by Keith Stevens, | ||
* [email protected] OCEC-10-073 All rights reserved. | ||
* | ||
* This file is part of the S-Space package and is covered under the terms and | ||
* conditions therein. | ||
* | ||
* The S-Space package is free software: you can redistribute it and/or modify | ||
* it under the terms of the GNU General Public License version 2 as published | ||
* by the Free Software Foundation and distributed hereunder to you. | ||
* | ||
* THIS SOFTWARE IS PROVIDED "AS IS" AND NO REPRESENTATIONS OR WARRANTIES, | ||
* EXPRESS OR IMPLIED ARE MADE. BY WAY OF EXAMPLE, BUT NOT LIMITATION, WE MAKE | ||
* NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY | ||
* PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE OR DOCUMENTATION | ||
* WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER | ||
* RIGHTS. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
*/ | ||
|
||
package edu.ucla.sspace.basis; | ||
|
||
import java.util.Set; | ||
|
||
|
||
/** | ||
* This {@link FilteredStringBasisMapping} allows a user to specify a set of | ||
* tokens that should be excluded automatically from the basis mapping. Any | ||
* calles to {@code getDimension} for words in this set will automatically | ||
* return {@code -1}. | ||
* | ||
* @author Keith Stevens | ||
*/ | ||
public class FilteredStringBasisMapping | ||
extends AbstractBasisMapping<String, String> { | ||
|
||
private static final long serialVersionUID = 1L; | ||
|
||
/** | ||
* The set of excluded words. | ||
*/ | ||
private final Set<String> excludedWords; | ||
|
||
/** | ||
* Creates a new {@link FilteredStringBasisMapping} where the words in | ||
* {@code excludedWords} will never receive a dimension in this mapping. | ||
*/ | ||
public FilteredStringBasisMapping(Set<String> excludedWords) { | ||
this.excludedWords = excludedWords; | ||
} | ||
|
||
/** | ||
* {@inheritDoc} | ||
*/ | ||
public int getDimension(String key) { | ||
String[] parts = key.split("-"); | ||
String base = (parts.length == 0) ? key : parts[0]; | ||
return excludedWords.contains(base) ? -1 : getDimensionInternal(key); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done