-
Notifications
You must be signed in to change notification settings - Fork 106
Vector
The vector
package provides data structures for representing a wide variety of numerical vectors, each with their own unique peformance characteristics. Paired with the vectors are methods for math operations (such as addition and subtraction), serialziation and deserialization, and casting. We provide vectors for double precision values and integer values. We also provide interfaces for both dense vectors and sparse vector, with a large emphasis on sparse vectors as many Semantic Spaces utilize sparse feature spaces. This code can be found here
We center the vector
package around the following interfaces:
-
Vector.java
: This provides basic vector operations such asset
,get
, andmagnitude
for any type of javaNumber
. -
DoubleVector.java
: This provides primite operations on double precision vectors. -
SparseDoubleVector.java
: This provides sparse primitive operations on double precision vectors, most importantly, this provides access to the set of non-zero indices in the vector. -
IntegerVector.java
: This provides primitive operations on integer vectors.
The primitive vector types include two useful methods: add
, which increments/decrements the value of some index, and toArray
, which returns a dense array form of the vector. Note that toArray
should only be used when the length of the vector is known and set to a reasonable number that can fit in memory, some sparse implementations permit an unspecified vector length.
As a simple example, consider adding two DoubleVector
s together. If both vectors are dense, the code might look like this:
DoubleVector a = ...
DoubleVector b = ...
for (int i = 0; i < a.length(); ++i)
b.add(i, a.get(i));
And if one vector, say a
, is sparse, the code may look like this:
SparseDoubleVector a = ...
DoubleVector b = ...
for (int i : a.getNonZeroIndices())
b.add(i, a.get(i));
The getNonZeroIndices
method can save a great deal of computation by only processing values which have a non zero value, which in many use cases can be well under 10% of the values.
Here, we list a few of the most frequently used vector
implementations.
-
CompactSparseVector
: This sparse implementation utilizes two internal data structures: a sorted list of non zero indices and the matching non zero values for each index. All access isO(log n)
. -
SparseHashDoubleVector
: This sparse implementation utilizes an internal hash map for all sparse values. All operations exceptgetNonZeroIndices
have constant time. The non zero indices are cached internally so that if no changes are made in between two calls togetNonZeroIndices
, the second call will perform in constant time. -
TernaryVector
: This sparse integer vector represents all values as either being positive, zero, or negative. The actualy value of all indices is discarded. -
DenseVector
: An array backedvector
. All operations exceptmagnitude
have constant access, but there is no support for sparse values. -
DenseDynamicMagnitudeVector
: An array backedvector
. Much likeDenseVector
, all operations are constant time, evenmagnitude
. Every vector modification updatesmagnitude
so that models needing constant access to the vector's magnitude while also updating the values can have fast performance.
VectorMath
provides optimized operations for addition, subtraction, pointwise multiplication, and the dot product between two vectors. If vectors are sparse, the methods will use reflection to access the sparse values, if needed, and only access the needed values.
VectorIO
provides methods for writing vectors to strings
Vectors
provides methods for casting, copying, equality testing, and computing subviews for vectors.