Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumberFormatException when attempting to run LatentSymanticAnalysis class #59

Open
kelvinAI opened this issue Jan 26, 2015 · 3 comments

Comments

@kelvinAI
Copy link

Hi, I'm facing an error while calling the LSA class as a library. The error was thrown during the processSpace() call.

void initialize() throws IOException {
    //....
    LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

    File input = new File("data/input2.txt");

    BufferedReader br = new BufferedReader(new FileReader(input));

    lsa.processDocument(br);

    lsa.processSpace(System.getProperties()); // <--- Error happens within this method

System Output:
Initializing MyLSAmain
Saving matrix using edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@60e53b93
Saw 19 terms, 8 unique
edu.ucla.sspace.lsa.LatentSemanticAnalysis@7adf9f5f processing doc edu.ucla.sspace.util.SparseIntHashArray@85ede7b
Jan 26, 2015 2:03:01 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Jan 26, 2015 2:03:01 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 3 dimensions
Exception in thread "main" java.lang.NumberFormatException: For input string: "nan"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at edu.ucla.sspace.matrix.MatrixIO.readDenseSVDLIBCtext(MatrixIO.java:994)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:809)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:762)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:153)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463)
at edu.ucla.sspace.mains.MyMain.initialize(MyMain.java:62)
at edu.ucla.sspace.mains.MyMain.(MyMain.java:23)
at edu.ucla.sspace.mains.MyMain.main(MyMain.java:33)

This is a follow up to #58 where I've managed to run LSAMain successfully. Am i missing something?
Thanks

@davidjurgens
Copy link
Collaborator

Could you please paste in the stack trace so we can see where in the LSA
code is throwing the exception?

On Mon, Jan 26, 2015 at 1:18 AM, fingorn [email protected] wrote:

void initialize() throws IOException {
//.....
LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

File input = new File("data/input2.txt");

BufferedReader br = new BufferedReader(new FileReader(input));

lsa.processDocument(br);

lsa.processSpace(System.getProperties());

//....
}


Reply to this email directly or view it on GitHub
#59.

@kelvinAI
Copy link
Author

I updated the issue on github but apparently it wasn't send out through
email. Could you please check it out on github?
On Jan 27, 2015 12:10 AM, "David Jurgens" [email protected] wrote:

Could you please paste in the stack trace so we can see where in the LSA
code is throwing the exception?

On Mon, Jan 26, 2015 at 1:18 AM, fingorn [email protected]
wrote:

void initialize() throws IOException {
//.....
LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

File input = new File("data/input2.txt");

BufferedReader br = new BufferedReader(new FileReader(input));

lsa.processDocument(br);

lsa.processSpace(System.getProperties());

//....
}


Reply to this email directly or view it on GitHub
#59.


Reply to this email directly or view it on GitHub
#59 (comment)
.

davidjurgens pushed a commit that referenced this issue Jan 27, 2015
	modified:   src/main/java/edu/ucla/sspace/matrix/MatrixIO.java

- Added support for reading in NaN values for SVDLIBC's output
@davidjurgens
Copy link
Collaborator

Ok, I've pushed a change that should fix this behavior. However, I just want to point out that you're seeing this error only because you're passing in an extremely small matrix to the SVD which is causing it to hit some degenerate case and produced NaN values. If you can, I would really recommend expanding your testing to using a larger corpus with more than three documents and eight terms. (Though things should "just work" regardless ;) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants