-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.ArrayIndexOutOfBoundsException #70
Comments
Hi Deepthi, Which version of the code are you using? It looks like the stack trace Thanks, On Wed, Feb 17, 2016 at 10:26 PM, DeepthiKarnam [email protected]
|
Hi David, I am using the jar "sspace-wordsi-2.0-jar-with-dependencies.jar". Is this not supported ? Currently, I am running on a sample of size 200 documents. However, the entire corpus is around 9000 documents. Is it scalable ? Each document is a pdf with close to ~500 words per document (without preprocessing). I am doing a simple preprocessing to remove stopwords and special characters from the text. Do you think, any additional preprocessing will help such as lemmatization ? |
Tried using sspace-2.0.1.jar Problem persists :'( Feb 18, 2016 12:35:44 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace |
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:94)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)
The number of words in my corpus turns to be 6000+. Is the code unable to reduce the size of the vector to 300 from 6000+. What is the solution ?
The text was updated successfully, but these errors were encountered: