Add content to documentation homepage.

kernelmethod · Jan 17, 2020 · 9b7a765 · 9b7a765
1 parent 50ddd3e
commit 9b7a765
Showing 1 changed file with 33 additions and 1 deletion.
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,3 +1,35 @@
 # LSH.jl
 
-Documentation for the LSH.jl package.
+LSH.jl is a Julia package for performing [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) with various similarity functions.
+
+## Introduction
+One of the simplest methods for classifying, categorizing, and grouping data is to measure how similarities pairs of data points are. For instance, the classical [``k``-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) takes a similarity function
+
+```math
+s:X\times X\to\mathbb{R}
+```
+
+and a query point ``x\in X``, where ``X`` is the input space. It then computes ``s(x,y)`` for every point ``y`` in a database, and keeps the ``k`` points that are closest to ``x``.
+
+Broadly, there are two computational issues with this approach:
+
+- First, the database may be massive, much larger than could possibly fit in memory. This would make the brute-force approach of computing ``s(x,y)`` for every point ``y`` in the database far too expensive to be practical.
+- Second, the dimensionality of the data may be such that computing ``s(x,y)`` is itself expensive. In addition, the similarity function itself may simply be intrinsically difficult to compute. For instance, calculating Wasserstein distance entails solving a very high-dimensional linear program.
+
+In order to solve these problems, researchers have over time developed a variety of techniques to accelerate similarity search:
+
+- [``k``-d trees](https://en.wikipedia.org/wiki/K-d_tree)
+- [Ball trees](https://en.wikipedia.org/wiki/Ball_tree)
+- Data reduction techniques
+
+## Locality-sensitive hashing
+*Locality-sensitive hashing* (LSH) is a technique for accelerating similarity search that works by using a hash function on the query point ``x`` and limiting similarity search to only those points in the database that experience a hash collision with ``x``. The hash functions that are used are randomly generated from a family of *locality-sensitive hash functions*. These hash functions have the property that ``Pr[h(x) = h(y)]`` (i.e., the probability of a hash collision) increases the more similar that ``x`` and ``y`` are.
+
+LSH.jl is a package that provides definitions of locality-sensitive hash functions for a variety of different similarities. Currently, LSH.jl supports hash functions for
+
+- Cosine similarity (`cossim`)
+- Jaccard similarity (`jaccard`)
+- ``L^1`` (Manhattan / "taxicab") distance (`ℓ1`)
+- ``L^2`` (Euclidean) distance (`ℓ2`)
+- Inner product (`inner_prod`)
+- Function-space hashes (`L1`, `L2`, and `cossim`)