From 9b7a7656ae1ef0271841475cbe0d7bafb212ff3d Mon Sep 17 00:00:00 2001
From: kernelmethod <17100608+kernelmethod@users.noreply.github.com>
Date: Thu, 16 Jan 2020 20:24:49 -0700
Subject: [PATCH] Add content to documentation homepage.

---
 docs/src/index.md | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/src/index.md b/docs/src/index.md
index 5214b2a..c7ecbdb 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -1,3 +1,35 @@
 # LSH.jl
 
-Documentation for the LSH.jl package.
+LSH.jl is a Julia package for performing [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) with various similarity functions.
+
+## Introduction
+One of the simplest methods for classifying, categorizing, and grouping data is to measure how similarities pairs of data points are. For instance, the classical [``k``-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) takes a similarity function
+
+```math
+s:X\times X\to\mathbb{R}
+```
+
+and a query point ``x\in X``, where ``X`` is the input space. It then computes ``s(x,y)`` for every point ``y`` in a database, and keeps the ``k`` points that are closest to ``x``.
+
+Broadly, there are two computational issues with this approach:
+
+- First, the database may be massive, much larger than could possibly fit in memory. This would make the brute-force approach of computing ``s(x,y)`` for every point ``y`` in the database far too expensive to be practical.
+- Second, the dimensionality of the data may be such that computing ``s(x,y)`` is itself expensive. In addition, the similarity function itself may simply be intrinsically difficult to compute. For instance, calculating Wasserstein distance entails solving a very high-dimensional linear program.
+
+In order to solve these problems, researchers have over time developed a variety of techniques to accelerate similarity search:
+
+- [``k``-d trees](https://en.wikipedia.org/wiki/K-d_tree)
+- [Ball trees](https://en.wikipedia.org/wiki/Ball_tree)
+- Data reduction techniques
+
+## Locality-sensitive hashing
+*Locality-sensitive hashing* (LSH) is a technique for accelerating similarity search that works by using a hash function on the query point ``x`` and limiting similarity search to only those points in the database that experience a hash collision with ``x``. The hash functions that are used are randomly generated from a family of *locality-sensitive hash functions*. These hash functions have the property that ``Pr[h(x) = h(y)]`` (i.e., the probability of a hash collision) increases the more similar that ``x`` and ``y`` are.
+
+LSH.jl is a package that provides definitions of locality-sensitive hash functions for a variety of different similarities. Currently, LSH.jl supports hash functions for
+
+- Cosine similarity (`cossim`)
+- Jaccard similarity (`jaccard`)
+- ``L^1`` (Manhattan / "taxicab") distance (`ℓ1`)
+- ``L^2`` (Euclidean) distance (`ℓ2`)
+- Inner product (`inner_prod`)
+- Function-space hashes (`L1`, `L2`, and `cossim`)