Skip to content
davidjurgens edited this page Jul 1, 2012 · 2 revisions

Introduction

The vector package provides data structures for representing a wide variety of numerical vectors, each with their own unique peformance characteristics. Paired with the vectors are methods for math operations (such as addition and subtraction), serialziation and deserialization, and casting. We provide vectors for double precision values and integer values. We also provide interfaces for both dense vectors and sparse vector, with a large emphasis on sparse vectors as many Semantic Spaces utilize sparse feature spaces. This code can be found here

The Interfaces

We center the vector package around the following interfaces:

  1. Vector.java: This provides basic vector operations such as set, get, and magnitude for any type of java Number.
  2. DoubleVector.java: This provides primite operations on double precision vectors.
  3. SparseDoubleVector.java: This provides sparse primitive operations on double precision vectors, most importantly, this provides access to the set of non-zero indices in the vector.
  4. IntegerVector.java: This provides primitive operations on integer vectors.

The primitive vector types include two useful methods: add, which increments/decrements the value of some index, and toArray, which returns a dense array form of the vector. Note that toArray should only be used when the length of the vector is known and set to a reasonable number that can fit in memory, some sparse implementations permit an unspecified vector length.

Examples

As a simple example, consider adding two DoubleVectors together. If both vectors are dense, the code might look like this:

DoubleVector a = ...
DoubleVector b = ...
for (int i = 0; i < a.length(); ++i)
    b.add(i, a.get(i));

And if one vector, say a, is sparse, the code may look like this:

SparseDoubleVector a = ...
DoubleVector b = ...
for (int i : a.getNonZeroIndices())
    b.add(i, a.get(i));

The getNonZeroIndices method can save a great deal of computation by only processing values which have a non zero value, which in many use cases can be well under 10% of the values.

Useful Implementations

Here, we list a few of the most frequently used vector implementations.

  • CompactSparseVector: This sparse implementation utilizes two internal data structures: a sorted list of non zero indices and the matching non zero values for each index. All access is O(log n).
  • SparseHashDoubleVector: This sparse implementation utilizes an internal hash map for all sparse values. All operations except getNonZeroIndices have constant time. The non zero indices are cached internally so that if no changes are made in between two calls to getNonZeroIndices, the second call will perform in constant time.
  • TernaryVector: This sparse integer vector represents all values as either being positive, zero, or negative. The actualy value of all indices is discarded.
  • DenseVector: An array backed vector. All operations except magnitude have constant access, but there is no support for sparse values.
  • DenseDynamicMagnitudeVector: An array backed vector. Much like DenseVector, all operations are constant time, even magnitude. Every vector modification updates magnitude so that models needing constant access to the vector's magnitude while also updating the values can have fast performance.

Utilities

VectorMath provides optimized operations for addition, subtraction, pointwise multiplication, and the dot product between two vectors. If vectors are sparse, the methods will use reflection to access the sparse values, if needed, and only access the needed values.

VectorIO provides methods for writing vectors to strings

Vectors provides methods for casting, copying, equality testing, and computing subviews for vectors.