-
Notifications
You must be signed in to change notification settings - Fork 106
ReleaseNotes
#summary Details of the various releases #labels Featured
<wiki:toc max_depth="2" />
= Algorithm Release Notes =
== Latent Semantic Analysis ==
=== Version 1.3 ===
Fixed bug where the singular values computed by the SVD were not used to weight the values in the term and document spaces. This change also makes the document space optional to retain, with the default behavior being not to retain it in memory.
=== Version 1.2 ===
Updated to fix a bug where SVDLIBJ wasn't included as an SVD algorithm, which resulting in systems not finding any SVD when one was present.
== Random Indexing ==
=== Version 1.3 ===
Fixed a bug where filtering out tokens would still cause them to be included in the final vector distributions.
== Hadoop Random Indexing ==
=== Version 1.0 ===
Initial release of a Hadoop-based implementation of Random Indexing.
== Incremental Semantic Analysis ==
=== Version 1.1 ===
Fixed a bug where filtering out tokens would still cause them to be included in the final vector distributions.
== COALS ==
=== Version 1.2 ===
Fixed several bugs in COALS where word-to-index mappings were off after the SVD was performed. This release also includes the SVD bug that affected LSA v. 1.1.
= S-Space Package Release Notes =
== Release 1.5 Snapshot ==
Due to the ever-increasing staleness of the 1.0 source release, we've taken a current snapshot of the SVN repository and arbitrarily labeled it 1.5. This version contains many performance enhancements and bug fixes from 1.0, along with a number of feature extensions, such as support for dependency parsed input and Hadoop jobs. However, we'll be releasing even more features in an official fashion soon, so this release may soon be superseded.
== Release 1.0 ==
This release indicates several months of testing and debugging. All [ImplementedAlgorithms Tier 1] algorithms are now fully vetted and have been highly optimized. All supported algorithms have now been versioned at 1.0, reflecting this release.
Highlights
- Significant optimizations to the core algorithms
- Support for fast streaming of matrices to disk
- Complete overhaul and optimization of vector and matrices classes
- Support for a variety of [Tokenizing tokenization] behaviors
- Basic support for interacting with [http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download Cluto] clustering facilities
- An expanded set of unit tests for library classes
=== Earlier Incremental Releases ===
=== July 24, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.9
Changes:
lsa.jar
Details
- New support for [http://dsd.lbl.gov/~hoschek/colt/ COLT] SVD
- Support for manually selecting the SVD algorithm to use
- Minor performance increase in parsing
=== July 6, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.8
Changes:
lsa.jar
Details
- Fixes a bug which prevented files from being processed.
=== June 18, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.6
[RandomIndexing] version 0.1.3
Changes:
lsa.jar
random-indexing.jar
Details
-
Fixed [http://code.google.com/p/airhead-research/issues/detail?id=17 Issue 17] where the TF-IDF transform for LSA did not work properly.
-
Reduced memory usage for Random Indexing when sparse semantics are used
-
Fixed potential bug in sparse semantics where index vectors values were incorrectly updated. This resulting in index vectors not being as orthogonal as they were initialized to be.
=== June 17, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.5
Changes:
lsa.jar
Details
- Fixed [http://code.google.com/p/airhead-research/issues/detail?id=16 Issue 16] where the resulting LSA vectors were not properly truncated when JAMA was used for the SVD.
=== June 16, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.4
Changes:
lsa.jar
Details
- Fixed [http://code.google.com/p/airhead-research/issues/detail?id=15 Issue 15] where JAMA wasn't supported for lsa.jar due to a classpath issue
=== June 1, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.3
[RandomIndexing] version 0.1.2
Changes:
lsa.jar
random-indexing.jar
Details:
-
Added sparse binary and sparse text output option where the resulting .sspace file is much smaller for Random Indexing. See [FileFormats] for details.
-
Performance fix for LSA Matlab-to-SVDLIBC matrix conversion path. Previously, large word-document matrices were loaded into memory for conversion, which resulted in unnecessary
OutOfMemory
exceptions. The fix does the conversion iteratively, which eliminates the memory bottleneck.
=== May 11, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.2
[RandomIndexing] version 0.1.1
Changes:
lsa.jar
random-indexing.jar
Details:
- Added a binary output option where the resulting .sspace file is much smaller. See [FileFormats] for details
- LSA matrix operations now clean up the temporary files they generate.
=== May 9, 2009 ===
[Coals] version 0.1.1
Changes:
coals.jar
Details: Revised release of Coals, where all rows are saved, and only the column lengths are shortened.
=== May 8, 2009 ===
[Coals] version 0.1.0
Changes:
coals.jar
Details: Initial release of Coals. Release supports configurable word dimensions, svd reduction, and size of svd reduction. See [Coals Coals] wiki page for full details.
=== May 7, 2009 ===
[RandomIndexing Random Indexing] version 0.1.0
Changes:
random-indexing.jar
Details: Initial release of Random Indexing. Release supports permutatations, configurable dimensions, and index vectors distributions. See [RandomIndexing Random Indexing] wiki page for full details.
=== May 1, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.1
Changes:
lsa.jar
Details: Bug fix for an internal conversion between matrix types. If the user used SVDLIBC, the program would incorrectly covert the format, which resulted in a malformatted S-Space.
=== April 27, 2009 ===
[LatentSemanticAnalysis LSA] version 0.1.0
Changes:
lsa.jar
Details: Initial release of the Latent Semantic Analysis command-line executable .jar file. Release supports configurable word-document preprocessing and number of dimensions. The implementation is also highly scalable and multi-threaded.
= Tool Release Notes =
== SVDLIBJ - svd.jar ==
=== Version 1.1 ===
-
Added missing support for Matlab dense text
-
Fixed typo in format description
-
Updated output behavior to more closely match SVDLIBC's. The output file is now a true output prefix.
=== Version 1.0 ===
- Initial release of wrapper around SVDLIBJ. Verified that it works
== Token Counter - tc.jar ==
=== Version 1.1 ===
-
Fixed typo in output
-
Added output for the verbose option to print the current number of unique tokens seen thus far
=== Version 1.0 ===
- Initial release of trie-based token counter for space-efficient token counting