learn-lang-diary/learn-lang-diary-part-eight.lyx

#LyX 2.3 created this file. For more info see http://www.lyx.org/
\lyxformat 544
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\usepackage{url} 
\usepackage{slashed}
\end_preamble
\use_default_options false
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding utf8
\fontencoding global
\font_roman "times" "default"
\font_sans "helvet" "default"
\font_typewriter "cmtt" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\use_microtype false
\use_dash_ligatures false
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref true
\pdf_bookmarks true
\pdf_bookmarksnumbered false
\pdf_bookmarksopen false
\pdf_bookmarksopenlevel 1
\pdf_breaklinks true
\pdf_pdfborder true
\pdf_colorlinks true
\pdf_backref false
\pdf_pdfusetitle true
\papersize default
\use_geometry false
\use_package amsmath 2
\use_package amssymb 2
\use_package cancel 1
\use_package esint 0
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 0
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 0
\use_minted 0
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\is_math_indent 0
\math_numbering_side default
\quotes_style english
\dynamic_quotes 0
\papercolumns 1
\papersides 1
\paperpagestyle default
\listings_params "basicstyle={\ttfamily},basewidth={0.45em}"
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Title
Language Learning Diary - Part Eight
\end_layout

\begin_layout Date
Sept 2022
\end_layout

\begin_layout Author
Linas Vepštas
\end_layout

\begin_layout Abstract
The language-learning effort involves research and software development
 to implement the ideas concerning unsupervised learning of grammar, syntax
 and semantics from corpora.
 This document contains supplementary notes and a loosely-organized semi-chronol
ogical diary of results.
 The notes here might not always makes sense; they are a short-hand for
 my own benefit, rather than aimed at you, dear reader!
\end_layout

\begin_layout Section*
Introduction
\end_layout

\begin_layout Standard
Part Eight of the diary consists of two parts.
 First, a very short collection of notes on hypervectors and Gaussian orthogonal
 ensembles.
 Collected here as a handy reference, because adequate articles on some
 of these ideas do not yet exist.
 The second part is an exploration of applying the idea of Gaussian orthogonal
 ensembles to a current dataset.
 Overall, a tremendous success! This appears to provide an excellent metric
 of word-similarity! Victory!
\end_layout

\begin_layout Section*
Summary Conclusions
\end_layout

\begin_layout Standard
A summary of what is found in this part of the diary:
\end_layout

\begin_layout Itemize
The first few sections discuss hypervectors.
 These are interesting, but do not appear to be directly useful in the current
 situation.
\end_layout

\begin_layout Itemize
The next section is titled 
\begin_inset Quotes eld
\end_inset

Gaussian Orthogonal Ensemble
\begin_inset Quotes erd
\end_inset

 (but this name is misleading and incorrect.
 I don't have a better name yet).
 In Chapters Three and Five, it was noted that the symmetric-MI of word
 pairs (built up out of disjuncts) is distributed as a Gaussian.
 Whenever one has such a situation, the points can be understood to be vectors
 on a high-dimensional unit sphere.
 In this case, associated to each word 
\begin_inset Formula $w$
\end_inset

 is a unit-length word-vectors 
\begin_inset Formula $\hat{w}$
\end_inset

 which is uniformly distributed on the unit sphere 
\begin_inset Formula $S_{N-1}$
\end_inset

 where 
\begin_inset Formula $N$
\end_inset

 is the size of the vocabulary.
\end_layout

\begin_layout Itemize
Such a distribution is perfectly uniform, whenever the sampling is taken
 from a perfect Gaussian; this is how uniform distributions on spheres are
 defined.
\end_layout

\begin_layout Itemize
Given two vectors 
\begin_inset Formula $\hat{w}$
\end_inset

 and 
\begin_inset Formula $\hat{u}$
\end_inset

 drawn from a uniform distribution, we expect the angle between them: 
\begin_inset Formula $\theta=\arccos\left(\hat{w}\cdot\hat{u}\right)$
\end_inset

 to be uniformly distributed on the interval 
\begin_inset Formula $\left[0,\pi\right]$
\end_inset

.
\end_layout

\begin_layout Itemize
The actual word-data is approximately uniform.
 The current favorite dataset is explored.
 We find that the word-vectors 
\begin_inset Formula $\hat{w}$
\end_inset

 are 
\begin_inset Quotes eld
\end_inset

almost
\begin_inset Quotes erd
\end_inset

 uniformly distributed, except that:
\end_layout

\begin_deeper
\begin_layout Itemize
There are very few word-pairs for which 
\begin_inset Formula $\theta\apprle\pi/6$
\end_inset

 and that those which do lie in this region are strongly similar, grammatically.
 By visual inspection, the similarity is excellent.
 This is the primary, most important result of this diary chapter.
\end_layout

\begin_layout Itemize
There are few or no word-pairs for which 
\begin_inset Formula $\theta\gtrsim3\pi/4$
\end_inset

.
 This means that there aren't any words which are 
\begin_inset Quotes eld
\end_inset

truly grammatically dissimilar from one-another
\begin_inset Quotes erd
\end_inset

.
 This result does not seem to be important or usable; just curious.
\end_layout

\end_deeper
\begin_layout Itemize
Given the collection of values 
\begin_inset Formula $\theta\left(w,u\right)$
\end_inset

 over pairs 
\begin_inset Formula $\left(u,w\right)$
\end_inset

, the trick of projecting to a sphere can be repeated, giving another set
 of vectors.
 This can again be repeated, 
\emph on
ad infinitum
\emph default
.
 The first repetition is explored.
 It looks OK; it provides similarities that seem to be just about as good
 as the original sphere set, maybe every so slightly worse.
 This second recomputation of similarities requires additional CPU time,
 which is considerable, and so does not seem worth the effort.
 It's measuring some hard-to-comprehend second-order effect in the distribution
 of grammatical similarity.
 That is to say, it's physical interpretation is unclear.
\end_layout

\begin_layout Standard
That's it.
 Moving forward, it is clear that 
\begin_inset Formula $\theta=\arccos\left(\hat{w}\cdot\hat{u}\right)$
\end_inset

 is the superior metric for measuring word grammatical similarity.
 And BTW, it is a true metric, satisfying the triangle inequality.
 It should form the foundation for future clustering work.
\end_layout

\begin_layout Section*
Bipolar Hypervectors
\end_layout

\begin_layout Standard
A bipolar hypervector is a vector in a 
\begin_inset Formula $D$
\end_inset

-dimensional space having values in the set 
\begin_inset Formula $\mathbb{Z}_{2}=\left\{ -1,1\right\} $
\end_inset

; that is, a vector in 
\begin_inset Formula $\left\{ -1,1\right\} ^{D}.$
\end_inset

 Every hypervector corresponds to a vertex of a hypercube centered at the
 origin of a 
\begin_inset Formula $D$
\end_inset

-dimensional space.
 Bipolar hypervectors have the interesting property that, for 
\begin_inset Formula $D$
\end_inset

 even, given any (random) vector 
\begin_inset Formula $v$
\end_inset

, if one flips half the bits to get a vector 
\begin_inset Formula $w$
\end_inset

, then 
\begin_inset Formula $v$
\end_inset

 and 
\begin_inset Formula $w$
\end_inset

 are orthogonal!
\end_layout

\begin_layout Standard
This interesting property can be used to map point sequences (
\begin_inset Quotes eld
\end_inset

curves
\begin_inset Quotes erd
\end_inset

) to sequences of bipolar hypervectors such that the endpoints of the point
 sequence are orthogonal, and intermediate points have increasingly larger
 cosine distances.
 Geometrically, this maps the point sequence to a sequence of corners on
 the hypercube that are increasingly distant from the starting point.
 of point sequences to hypervectors
\end_layout

\begin_layout Standard
This map is a homomorphism that preserves the metric on the point sequence.
 This metric property can then be deployed to simplify classification problems,
 by mapping the space to be classified to vector arithmetic and cosine distances.
 These two ideas are developed below.
\end_layout

\begin_layout Subsection*
Metric properties
\end_layout

\begin_layout Standard
Let 
\begin_inset Formula $\left[p_{0},\cdots,p_{N}\right]$
\end_inset

 be a totally ordered sequence of points.
 The total order is just 
\begin_inset Formula $p_{i}<p_{j}$
\end_inset

 for 
\begin_inset Formula $i<j$
\end_inset

 for integer index 
\begin_inset Formula $i,j$
\end_inset

.
 This order can be metricized with a metric 
\begin_inset Formula $g$
\end_inset

 such that 
\begin_inset Formula $g\left(p_{i},p_{j}\right)=\left|i-j\right|/N$
\end_inset

.
 The metric is normalized written so that the maximum distance is 1.
 Pick a dimension 
\begin_inset Formula $D>2N$
\end_inset

 and conventionally 
\begin_inset Formula $D\gg2N$
\end_inset

 and an arbitrary initial (random) vector 
\begin_inset Formula $v_{0}$
\end_inset

 that will correspond to 
\begin_inset Formula $p_{0}$
\end_inset

.
 Generate a sequence of bipolar hypervectors 
\begin_inset Formula $v_{k}$
\end_inset

 as follows.
 Given 
\begin_inset Formula $v_{i}$
\end_inset

, select (randomly) 
\begin_inset Formula $D/2N$
\end_inset

 bits that have not been selected before, and flip them, to obtain 
\begin_inset Formula $v_{i+1}$
\end_inset

.
 
\end_layout

\begin_layout Standard
The above generates a sequence of (bipolar hyperdimensional) vectors with
 the following properties.
 The dot product is 
\begin_inset Formula 
\[
v_{k}\cdot v_{k}=D
\]

\end_inset

For neighboring points, the dot product is 
\begin_inset Formula 
\[
v_{k}\cdot v_{k+1}=D\left(1-\frac{1}{N}\right)
\]

\end_inset

because these differ in 
\begin_inset Formula $D/2N$
\end_inset

 bit locations.
 (The Hamming distance is 
\begin_inset Formula $D/2N$
\end_inset

; so 
\begin_inset Formula $D/2N$
\end_inset

 bit positions that are 
\begin_inset Formula $+$
\end_inset

 are replaced by 
\begin_inset Formula $-$
\end_inset

, and so that total sum decreases by 
\begin_inset Formula $N$
\end_inset

.)
\end_layout

\begin_layout Standard
In general, the Hamming distance between 
\begin_inset Formula $v_{k}$
\end_inset

 and 
\begin_inset Formula $v_{k+n}$
\end_inset

 is 
\begin_inset Formula $nD/2N$
\end_inset

 and so the dot product is
\begin_inset Formula 
\[
v_{k}\cdot v_{k+n}=D\left(1-\frac{n}{N}\right)
\]

\end_inset

or equivalently
\begin_inset Formula 
\[
v_{i}\cdot v_{j}=D\left(1-\frac{\left|i-j\right|}{N}\right)
\]

\end_inset

so that 
\begin_inset Formula 
\[
v_{0}\cdot v_{N}=0
\]

\end_inset

are orthogonal.
 The Hamming distance between orthogonal vectors is necessarily 
\begin_inset Formula $D/2$
\end_inset

.
\end_layout

\begin_layout Standard
Normalizing by 
\begin_inset Formula $D$
\end_inset

 and subtracting from 1 reproduces the original metric on the point sequence:
 i.e.
 
\begin_inset Formula 
\[
g\left(p_{i},p_{j}\right)=\frac{\left|i-j\right|}{N}=1-\frac{v_{i}\cdot v_{j}}{D}
\]

\end_inset


\end_layout

\begin_layout Standard
Of course, all this machination is pointless for one-dimensional point sequences.
 So ...
 for the more complex case.
\end_layout

\begin_layout Subsection*
Dimensional Oxidation
\end_layout

\begin_layout Standard
In chemistry, oxidation is the opposite of reduction.
 If dimensional reduction is the reduction of the number of dimensions to
 describe a dataset, then playing on this, dimensional oxidation is the
 act of increasing dimensions.
\end_layout

\begin_layout Standard
One conventional machine learning problem is the classification of regions
 of some 
\begin_inset Formula $M$
\end_inset

-dimensional space features.
 That is, there are a set of real-valued features 
\begin_inset Formula $f_{m}\in\mathbb{R}$
\end_inset

 for 
\begin_inset Formula $1\le m\le M$
\end_inset

.
 These are presumed to be bounded, so that 
\begin_inset Formula $f_{m}^{\mathrm{min}}\le f_{m}\le f_{m}^{\mathrm{max}}$
\end_inset

 so that these can be normalized to the unit cube 
\begin_inset Formula $\left[0,1\right]^{M}$
\end_inset

 by writing 
\begin_inset Formula 
\[
x_{m}=\frac{f_{m}-f_{m}^{\mathrm{min}}}{f_{m}^{\mathrm{max}}-f_{m}^{\mathrm{min}}}
\]

\end_inset

Each unit interval may be digitized (partitioned) into 
\begin_inset Formula $N+1$
\end_inset

 distinct sub-intervals.
 This partitioning 
\begin_inset Formula $\left[p_{0},\cdots,p_{N}\right]$
\end_inset

 then provides a totally ordered points sequence that can be mapped to bipolar
 hypervectors.
 A given point 
\begin_inset Formula $x\in\left[0,1\right]^{M}$
\end_inset

 is thus mapped to 
\begin_inset Formula $M$
\end_inset

 vectors 
\begin_inset Formula $v_{m}$
\end_inset

.
 Summing these provides a mapping of the unit cube to 
\begin_inset Formula $\mathbb{Z}^{D}$
\end_inset

:
\begin_inset Formula 
\[
w=w\left(x\right)=\sum_{m}v_{m}
\]

\end_inset

Since the sum is bounded between 
\begin_inset Formula $-M$
\end_inset

 and 
\begin_inset Formula $M$
\end_inset

 in each direction, this vector lies on the hypercube lattice 
\begin_inset Formula $\left(2M\right)^{D}$
\end_inset

.
\end_layout

\begin_layout Standard
If these vectors are normalized to unit length, then they live on the surface
 of the hyper-sphere 
\begin_inset Formula $S_{D-1}$
\end_inset

.
 In general, these points are not random, evenly distributed on the hyper-sphere.
 In a narrow sense, we can offer a theorem: the points are randomly distributed
 on the hyper-sphere if and only if they are randomly distributed in the
 unit cube 
\begin_inset Formula $\left[0,1\right]^{M}$
\end_inset

.
 However, in a broader sense, when 
\begin_inset Formula $N\ll D$
\end_inset

, points are increasingly scattered on the unit sphere, becoming increasingly
 uniformly distributed.
 This is effectively because the start and end points of the unit interval
 are mapped to random hypervectors.
 (This follows(?) because random hypervectors have a binomial bit distribution,
 and for large 
\begin_inset Formula $D$
\end_inset

, the binomial distribution approaches the Gaussian).
\end_layout

\begin_layout Standard
So again, we seem to approach the case of a Gaussian Orthogonal Ensemble.
 Kind of...
\end_layout

\begin_layout Subsection*
Efficient Dimensional Oxidation
\end_layout

\begin_layout Standard
A way to encode low-dimensional classification problems into hypervectors
 that sharply improves on the naive uniform feature-space digitization is
 described in Basaklar, 
\emph on
et al.

\emph default
 
\begin_inset Quotes eld
\end_inset

Hypervector Design for Efficient Hyperdimensional Computing on Edge Devices
\begin_inset Quotes erd
\end_inset

 
\begin_inset CommandInset href
LatexCommand href
name "https://arxiv.org/pdf/2103.06709.pdf"
target "https://arxiv.org/pdf/2103.06709.pdf"
literal "false"

\end_inset


\end_layout

\begin_layout Standard
The presumption is that there is a preexisting training dataset, the parameter
 space is low dimensional, and the number of clusters is fixed.
\end_layout

\begin_layout Standard
The solution is to divide up the parameter space in a non-uniform kind of
 way, devoting lots of extra hypervector bit-flips to the boundary zones
 between clusters, so that the boundaries of the clusters can be cleanly
 distinguished.
 This is done by specifying an integer optimization problem.
 The trick is to (i) maximize the training accuracy and (ii) minimize the
 similarity between class encoders, subject to (iii) orthogonalization of
 parameter endpoints.
 Because this is an integer optimization problem, a genetic algorithm is
 used to perform the search.
 The paper provides details.
\end_layout

\begin_layout Standard
This is not directly relevant for us, because we don't have a training set,
 nor do we know 
\emph on
a priori
\emph default
 how many clusters there will be.
 
\end_layout

\begin_layout Subsection*
Factoids
\end_layout

\begin_layout Standard
Assorted notes:
\end_layout

\begin_layout Itemize
Almost all random vectors are orthogonal to one another (or nearly so).
 This follows from binomial coefficients being approximations for Gaussians.
 There are 
\begin_inset Formula 
\[
{D \choose D/2}=\frac{D!}{\left(D/2\right)!^{2}}\approx\sqrt{\frac{2}{\pi D}}\,2^{D}
\]

\end_inset

orthogonal vectors, which follows from Stirling's law 
\begin_inset Formula $n!\approx\sqrt{2\pi n}e^{-n}n^{n}$
\end_inset

.
 There are 
\begin_inset Formula 
\[
{D \choose \frac{D}{2}+1}=\frac{D!}{\left(\frac{D}{2}-1\right)!\left(\frac{D}{2}+1\right)!}\approx\sqrt{2/\pi D}\,2^{D}xxx
\]

\end_inset

almost orthogonal vectors, that differ by one bit.
 More generally, for 
\begin_inset Formula $n\ll D$
\end_inset

 there are 
\begin_inset Formula 
\[
{D \choose \frac{D}{2}+n}=\frac{D!}{\left(\frac{D}{2}-n\right)!\left(\frac{D}{2}+n\right)!}\approx\sqrt{2/\pi D}\,2^{D}\left(1-\frac{2n^{2}}{D}\right)
\]

\end_inset

vectors that differ by 
\begin_inset Formula $n$
\end_inset

 bits.
 (Need to double check this, might be errors).
\end_layout

\begin_layout Itemize
Given two corners on a hypercube differing by 
\begin_inset Formula $d$
\end_inset

 bits, there are 
\begin_inset Formula $2^{d}$
\end_inset

 shortest paths between them.
\end_layout

\begin_layout Itemize
The midway points on such paths can have larger Jacquard (Hamming) distances
 to each other, than to either endpoint.
 In fact, this will almost always be the case.
\end_layout

\begin_layout Itemize
If the vectors are normalized, they can be seen to live on the surface of
 a sphere (of the same dimension).
\end_layout

\begin_layout Itemize
Ternary hypervectors i.e.
 elements of 
\begin_inset Formula $\left\{ -1,0,1\right\} ^{D}$
\end_inset

 form a field under point-wise multiplication and sgn applied to arithmetic
 addition.
\end_layout

\begin_layout Itemize
The projection of lattice points in 
\begin_inset Formula $\mathbb{Z}^{D}$
\end_inset

 to the unit sphere 
\begin_inset Formula $S_{D-1}$
\end_inset

 is presumably dense.
 No clue if its 
\begin_inset Quotes eld
\end_inset

uniformly
\begin_inset Quotes erd
\end_inset

 distributed; presumably its not, much like the rationals in the unit interval.
\end_layout

\begin_layout Itemize
No clue what the analogs of the modular group or the fundamental domain
 are.
 
\end_layout

\begin_layout Standard

\end_layout

\begin_layout Section*
Spin Glasses
\end_layout

\begin_layout Standard
Cribbed notes from Michel TALAGRAND 
\begin_inset Quotes eld
\end_inset

Mean Field Models for Spin Glasses Volume I: Basic Examples
\begin_inset Quotes erd
\end_inset

 (2010) Springer-Verlag.
\end_layout

\begin_layout Section*
Gaussian Orthogonal Ensembles
\end_layout

\begin_layout Standard
Well, not really; that's just the working title.
 There's no actual ensemble.
 
\end_layout

\begin_layout Standard
The thing to explore is this: suppose there's an index 
\begin_inset Formula $w,u\in\left\{ 1,\cdots,N\right\} $
\end_inset

 i.e.
 
\begin_inset Formula $N$
\end_inset

-dimensional.
 Suppose there are numbers 
\begin_inset Formula $f\left(w,u\right)=f\left(u,w\right)$
\end_inset

 which are distributed with a normal distribution with mean 
\begin_inset Formula $\mu$
\end_inset

 and stddev 
\begin_inset Formula $\sigma$
\end_inset

.
 Experimentally, we compute 
\begin_inset Formula 
\[
\mu=\frac{1}{N^{2}}\sum_{u,w}f\left(u,w\right)=\left\langle f\right\rangle 
\]

\end_inset

and
\begin_inset Formula 
\begin{align*}
\sigma^{2}= & \frac{1}{N^{2}}\sum_{u,w}\left(f\left(u,w\right)-\mu\right)^{2}\\
= & \frac{1}{N^{2}}\sum_{u,w}f^{2}\left(u,w\right)-\mu^{2}\\
= & \left\langle f^{2}\right\rangle -\left\langle f\right\rangle ^{2}
\end{align*}

\end_inset

as usual.
 Define 
\begin_inset Formula $g\left(u,w\right)=\left(f\left(u,w\right)-\mu\right)/\sigma$
\end_inset

 and then 
\begin_inset Formula $g\left(u,w\right)$
\end_inset

 is normally distributed about zero with unit stddev.
\end_layout

\begin_layout Standard
Fix 
\begin_inset Formula $w$
\end_inset

 and define an 
\begin_inset Formula $N$
\end_inset

-dimensional vector 
\begin_inset Formula $\vec{w}\in\mathbb{R}^{N}$
\end_inset

 whose vector components are 
\begin_inset Formula $w_{u}=g\left(w,u\right)$
\end_inset

.
 Normalize them to unit length, so that 
\begin_inset Formula $\hat{w}=\vec{w}/\left\Vert \vec{w}\right\Vert $
\end_inset

 where 
\begin_inset Formula $\left\Vert \vec{w}\right\Vert =\left\Vert \vec{w}\right\Vert _{2}$
\end_inset

 is the Euclidean norm.
 This is the Gaussian orthogonal ensemble.
 There are 
\begin_inset Formula $N$
\end_inset

 of these vectors, they are uniformly distributed on the 
\begin_inset Formula $N-1$
\end_inset

 sphere 
\begin_inset Formula $S_{N-1}$
\end_inset

.
 
\end_layout

\begin_layout Standard
The experimental goal is to obtain these vectors, on the actual datasets,
 were 
\begin_inset Formula $f\left(w,u\right)$
\end_inset

 is the MI for two words, the MI being given via the disjunct+shape formulas
 explored in earlier chapters.
 So lets see what we get.
\end_layout

\begin_layout Subsection*
Ranking
\end_layout

\begin_layout Standard
The ranked MI has the form 
\begin_inset Formula $f\left(w,u\right)=s\left(w,u\right)-r\left(w\right)-r\left(u\right)$
\end_inset

.
 How does this change the above?
\begin_inset Formula 
\begin{align*}
\mu= & \frac{1}{N^{2}}\sum_{u,w}f\left(u,w\right)\\
= & \frac{1}{N^{2}}\sum_{u,w}s\left(u,w\right)-\frac{2}{N}\sum_{w}r\left(w\right)\\
= & \left\langle s\right\rangle -2\left\langle r\right\rangle 
\end{align*}

\end_inset

and next
\begin_inset Formula 
\begin{align*}
\sigma^{2}= & \frac{1}{N^{2}}\sum_{u,w}f^{2}\left(u,w\right)-\mu^{2}\\
= & \frac{1}{N^{2}}\sum_{u,w}\left[s\left(u,w\right)-r\left(w\right)-r\left(u\right)\right]^{2}-\left[\left\langle s\right\rangle -2\left\langle r\right\rangle \right]^{2}\\
= & \frac{1}{N^{2}}\sum_{u,w}\left[s^{2}\left(u,w\right)-4s\left(w,u\right)r\left(w\right)+4r^{2}\left(w\right)\right]-\left[\left\langle s\right\rangle ^{2}-4\left\langle s\right\rangle \left\langle r\right\rangle +4\left\langle r\right\rangle ^{2}\right]\\
= & \left\langle s^{2}\right\rangle -4\left\langle sr\right\rangle +4\left\langle r^{2}\right\rangle -\left\langle s\right\rangle ^{2}+4\left\langle s\right\rangle \left\langle r\right\rangle -4\left\langle r\right\rangle ^{2}\\
= & \left[\left\langle s^{2}\right\rangle -\left\langle s\right\rangle ^{2}\right]-4\left[\left\langle sr\right\rangle -\left\langle s\right\rangle \left\langle r\right\rangle \right]+4\left[\left\langle r^{2}\right\rangle -\left\langle r\right\rangle ^{2}\right]
\end{align*}

\end_inset

and so in general we see this is not equivalent to plain 
\begin_inset Formula $f$
\end_inset

 and so we need to keep track of both, separately.
\end_layout

\begin_layout Section*
Experiment-15 (21 Sept 2022)
\end_layout

\begin_layout Standard
The data is in experiment-15.
 MI's are computed for the 
\begin_inset Formula $N$
\end_inset

 top-ranked words.
 Keep in mind that the MI's here are the symmetric MI's obtained from word-disju
nct vectors.
\end_layout

\begin_layout Standard
Datasets containing MI-similarities:
\end_layout

\begin_layout Standard
\align center
\begin_inset VSpace defskip
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="5" columns="5">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dataset
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $N$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $N_{pairs}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
RAM
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cpu to load
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r14-sim200.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
200
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20100
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4.3GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
11 min
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim500.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
500
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
125250
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4.6GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
12 min
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim2500.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2500
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3126250
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.4GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
21 min
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim6000.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6000
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
18003000
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
16.3GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
42 min
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Each larger dataset is built from the smaller one; this allows experiments
 to be run in parallel; otherwise, its the same data.
 The 
\begin_inset Formula $N_{pairs}$
\end_inset

 is just the number of SimilarityLinks in the dataset.
 The number of pairs is exactly 
\begin_inset Formula $N\left(N+1\right)/2$
\end_inset

.
 The similarities computed as described earlier, using the cross-sections
 and shapes as a part of the vectors.
 RAM consumption includes that for shapes/cross-sections as well as similarities.
 The CPU-time-to-load includes the CPU-time to load shapes and cross-sections.
 These are needed for computing MI, but are not needed for the GOE computations
 (however, they are needed to get a list of ranked words.) 
\end_layout

\begin_layout Standard
Here are the means and variations, for various different values of 
\begin_inset Formula $N$
\end_inset

:
\end_layout

\begin_layout Standard
\align center
\begin_inset VSpace defskip
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="5" columns="6">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dataset
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $N$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\mu_{\mbox{MI}}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\sigma_{\mbox{MI}}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\mu_{\mbox{ranked-MI}}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\sigma_{\mbox{ranked-MI}}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r14-sim200.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
200
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-1.6966
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.2703
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.5947
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.3151
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim500.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
500
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-1.8062
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.1579
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.4925
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.2165
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim2500.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2500
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-1.4053
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.8985
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2.1492
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.9518
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
r15-sim6000.rdb
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6000
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.7240
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.6398
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2.7543
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.6019
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Below are some graphs showing this dataset.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Data collected with first 50 lines of 
\family sans
utils/orthogonal-ensemble.scm
\family default
 from data in experiment-15.
 Plotted by 
\family sans
p8-goe/sim-mi-dist.gplot
\family default
.
\end_layout

\end_inset

 It's not as pretty as some of the earlier graphs.
 Why? See, for example, page 14 and page 25 of chapter three (ranked-MI
 here is called common-MI there).
 But it is comparable to those in the second half of the diary chapter five
 (pages 19, 20, 38-40), so I guess that's OK.
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/sim-mi-dist-500.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/sim-mi-dist-2500.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
Above left: the Gaussian shown has the same parameters as the earlier ones,
 in chapter three.
 Here N=500 pairs vs N=1200 in chapter three.
 Also shown is ranked-MI, which shifts slightly to the right.
 Above right: just the MI, with two Gaussians.
 One is an 
\begin_inset Quotes eld
\end_inset

eyeballed fit
\begin_inset Quotes erd
\end_inset

, adjusting the Gaussian to match the tail of the distribution.
 This is the G(-3,3.1) Gaussian.
 The other shows the results of computing the mean and stddev of data: 
\begin_inset Formula $\mu=-2.1492$
\end_inset

 and 
\begin_inset Formula $\sigma=2.9518$
\end_inset

 as reported in the table above.
 The mean is clearly influenced by the blip in the data just above the mean.
 The tails have very little influence on the calculation of the mean and
 stddev ...
 and yet...
 maybe they should? Maybe we should be fitting to the tails with equal weight
 as the center? 
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/sim-mi-rmi-2500.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/sim-mi-500-2500.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
Above left: compare MI to ranked-MI without the visual distraction of the
 Gaussian.
 The right hand side looks a lot more linear.
 This linear-like right-hand-side can also be seen in graphs in chapters
 three, five, but not as pronounced as here.
 Above right: compare distributions for varying N.
 
\end_layout

\begin_layout Standard
Below, do it again, up to 
\begin_inset Formula $N=6000$
\end_inset

.
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/sim-mi-6k.eps
	width 70text%

\end_inset


\end_layout

\begin_layout Standard
The above extends the distribution to 
\begin_inset Formula $N\left(N+1\right)/2=18003000$
\end_inset

 or 18 million pairs.
 The linearity on the right is maintained, and seems to have the particularly
 simple form of 
\begin_inset Formula $e^{-MI}/2$
\end_inset

 while the left side is imperfectly Gaussian (seems to drop off even faster).
 Recall that this dataset has some fair amount of garbage in it (due to
 bad escaping of quotes) and also has been trimmed; so its not clear whether
 the linear right is 
\begin_inset Quotes eld
\end_inset

real data
\begin_inset Quotes erd
\end_inset

 or 
\begin_inset Quotes eld
\end_inset

garbage
\begin_inset Quotes erd
\end_inset

 or 
\begin_inset Quotes eld
\end_inset

artifact of trimming
\begin_inset Quotes erd
\end_inset

.
 
\end_layout

\begin_layout Subsection*
GOE Distributions
\end_layout

\begin_layout Standard
So, using MI and RMI, define the Gaussian vectors as described at the beginning
 of this section.
 The base space has dimension 
\begin_inset Formula $N=2500$
\end_inset

 – i.e.
 we've computed MI and RMI for the top-ranked 2500 words, for a total of
 
\begin_inset Formula $N\left(N+1\right)/2=3126250$
\end_inset

 word-pairs.
 That is, we have 
\begin_inset Formula $N=2500$
\end_inset

 vectors 
\begin_inset Formula $\vec{w}$
\end_inset

.
 Out of these, take the 
\begin_inset Formula $M$
\end_inset

 top-ranked words, and compute the cosines
\begin_inset Formula 
\[
\cos\left(\theta_{wu}\right)=\hat{w}\cdot\hat{u}=\frac{\vec{w}\cdot\vec{u}}{\left\Vert \vec{w}\right\Vert \left\Vert \vec{u}\right\Vert }
\]

\end_inset

The graph below shows the cosine distribution for the top 
\begin_inset Formula $M=250$
\end_inset

 words, or 
\begin_inset Formula $M\left(M+1\right)/2=31375$
\end_inset

 pairs.
 There are two graphs, actually, one for vectors built from MI and another
 from RMI.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Graphs built with code in 
\family sans
util/orthogonal-ensemble.scm
\family default
 lines 165-200 and 
\family sans
p8-goe/cos-mi-dist.gplot
\family default
.
 Datasets are 
\family sans
r15-got-2500.rdb
\family default
 and similar.
\end_layout

\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/cos-mi-dist-250.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
The distribution is flat-topped.
 This is exactly what we'd expect, if the initial distribution was perfectly
 Gaussian.
 But its not, so there's a fall-off.
 The peak at 
\begin_inset Formula $\cos\theta=1$
\end_inset

 is for the self-similarity.
 Very curious is the notch right below 
\begin_inset Formula $\cos\theta=1$
\end_inset

.
 This notch indicates an actual repulsion.
 Word are actually being repelled – there is a minimum cosine distance!
 That is, the word-vectors are uniformly distributed across the sphere 
\begin_inset Formula $S_{N-1}$
\end_inset

 except for two things: there is a repulsive force, that prevents vectors
 from getting close to each-other, and depopulates near 
\begin_inset Formula $\cos\theta=1$
\end_inset

.
 There is also an 
\begin_inset Quotes eld
\end_inset

inverted
\begin_inset Quotes erd
\end_inset

 force, depopulating a much larger region around 
\begin_inset Formula $\cos\theta=-1$
\end_inset

.
\end_layout

\begin_layout Standard
The depopulated region around 
\begin_inset Formula $\cos\theta=-1$
\end_inset

 seems like it should be 
\begin_inset Quotes eld
\end_inset

less interesting
\begin_inset Quotes erd
\end_inset

.
 It seems to be saying that there are no words which are truly 
\begin_inset Quotes eld
\end_inset

different
\begin_inset Quotes erd
\end_inset

 from all other words.
 
\end_layout

\begin_layout Standard
Perhaps the notches can be parameterized by a repulsive force...
 how?
\end_layout

\begin_layout Standard
Again, below-left, this time showing the distributions for the top top 
\begin_inset Formula $M=250$
\end_inset

 words (as before), the top 
\begin_inset Formula $M=500$
\end_inset

 words and the top 
\begin_inset Formula $M=1000$
\end_inset

 words.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Graphs built with code in 
\family sans
util/orthogonal-ensemble.scm
\family default
 lines 203-232 and 
\family sans
p8-goe/cos-mi-dist-500.gplot
\family default
.
\end_layout

\end_inset

 Below right are the same three curves, except this time the vector length
 was 
\begin_inset Formula $N=6000$
\end_inset

.
 
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/cos-mi-dist-500.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/cos-mi-d6k.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
Much as before; slightly less flat.
 Notch on the right is slightly larger; the notch on the left is slightly
 smaller.
 The 
\begin_inset Formula $M=1000$
\end_inset

 has a spike at precisely cosine=0.
 This is due to the appearance of word-pairs that have nothing in common:
 either one or the other has a zero vector component, thus the dot product
 is zero.
 A zero vector component corresponds to 
\begin_inset Formula $MI=-\infty$
\end_inset

 i.e.
 a pair of words which have no disjuncts in common.
 The appearance of this spike suggests we've reached the limit of the utility
 of the vectors: longer vectors are needed.
 
\end_layout

\begin_layout Standard
The mean and stddev of the 
\begin_inset Formula $N=2500$
\end_inset

 and 
\begin_inset Formula $M=1000$
\end_inset

 graph is 
\begin_inset Formula $\mu_{F}=0.09075$
\end_inset

 and 
\begin_inset Formula $\sigma_{F}=0.3389$
\end_inset

.
 This is used to construct 
\begin_inset Formula $F_{2}$
\end_inset

 described several sections below.
\end_layout

\begin_layout Standard
Again, this time showing what happens when the vector lengths are varied.
 One the left, the distribution of dot products for the top-ranked 
\begin_inset Formula $M=250$
\end_inset

 words (as before), but compares vectors of length 
\begin_inset Formula $N=2500$
\end_inset

 and 
\begin_inset Formula $N=6000$
\end_inset

.
 The plateau is narrower.
 On the right, same, but the distribution of dot products for the top-ranked
 
\begin_inset Formula $M=500$
\end_inset

 words.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Made with 
\family sans
cos-mi-dN.gplot
\family default
 from scripts as above.
 
\end_layout

\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/cos-mi-dN.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/cos-mi-500-dN.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Subsubsection*
Repulsion estimate
\end_layout

\begin_layout Standard
Consider two vectors that point at the corners of an 
\begin_inset Formula $N$
\end_inset

-cube.
 Nearest-neighbor corners are then a distance 
\begin_inset Formula $\cos\theta=\left(N-2\right)/N$
\end_inset

 apart, or 
\begin_inset Formula $1-\cos\theta=2/N$
\end_inset

 For 
\begin_inset Formula $N=2500$
\end_inset

 this is 
\begin_inset Formula $2/2500=0.0008=8\times10^{-4}$
\end_inset

.
 By comparison, the right-hand gap above is at about 0.9, so this is over
 100 nearest neighbors apart.
 For 
\begin_inset Formula $N=6000$
\end_inset

 this is even farther apart.
\end_layout

\begin_layout Standard
If one selects 
\begin_inset Formula $J$
\end_inset

 random corners of the 
\begin_inset Formula $N$
\end_inset

-dimensional hypercube, what is the average distance between them? Here,
 the hypercube is 
\begin_inset Formula $\left\{ -1,+1\right\} ^{N}$
\end_inset

.
 That is, select 
\begin_inset Formula $J$
\end_inset

 random bit-strings of length 
\begin_inset Formula $N$
\end_inset

 – what is the average Hamming distance between them? In the limit 
\begin_inset Formula $N\to\infty$
\end_inset

, this is famously a Gaussian.
 In the non-limit, what is it? This is a combinatoric question.
 There are a total of 
\begin_inset Formula $2^{N}$
\end_inset

 such vectors.
 For a fixed initial vector, there are 
\begin_inset Formula $N$
\end_inset

 vectors that differ by one position, and 
\begin_inset Formula $N\left(N-1\right)/2$
\end_inset

 that differ by two bits, and the general case is given by the binomial
 coefficient 
\begin_inset Formula ${N \choose k}$
\end_inset

 for 
\begin_inset Formula $k$
\end_inset

 bit differences.
 Note that 
\begin_inset Formula 
\[
\sum_{k=0}^{N}{N \choose k}=2^{N}
\]

\end_inset

so we've counted them all.
 The average Hamming distance is then
\begin_inset Formula 
\[
\left\langle k\right\rangle =\frac{1}{2^{N}}\sum_{k=0}^{N}k{N \choose k}=\frac{N}{2}
\]

\end_inset

and the mean-square distance is
\begin_inset Formula 
\[
\left\langle k^{2}\right\rangle =\frac{1}{2^{N}}\sum_{k=0}^{N}k^{2}{N \choose k}=\frac{N^{2}+N}{4}
\]

\end_inset

and so the rms is 
\begin_inset Formula 
\[
\sigma_{k}=\sqrt{\left\langle k^{2}\right\rangle -\left\langle k\right\rangle ^{2}}=\frac{\sqrt{N}}{2}
\]

\end_inset

Given two bit-vectors of length 
\begin_inset Formula $N$
\end_inset

 differing in 
\begin_inset Formula $k$
\end_inset

 bits, the cosine between then is 
\begin_inset Formula $\cos\theta=1-2k/N$
\end_inset

.
 We thus immediately conclude that 
\begin_inset Formula 
\[
\left\langle \cos\theta\right\rangle =0
\]

\end_inset

and
\begin_inset Formula 
\begin{align*}
\left\langle \cos^{2}\theta\right\rangle = & \left\langle 1-\frac{4k}{N}+\frac{4k^{2}}{N^{2}}\right\rangle \\
= & 1-\frac{4\left\langle k\right\rangle }{N}+\frac{4\left\langle k^{2}\right\rangle }{N^{2}}\\
= & 1-2+\frac{N^{2}+N}{N^{2}}\\
= & \frac{1}{N}
\end{align*}

\end_inset

and so the rms is 
\begin_inset Formula $\sqrt{\left\langle \cos^{2}\theta\right\rangle }=1/\sqrt{N}$
\end_inset

.
 For 
\begin_inset Formula $N=2500$
\end_inset

 and 
\begin_inset Formula $6000$
\end_inset

, we have 
\begin_inset Formula $\sqrt{\left\langle \cos^{2}\theta\right\rangle }=0.02$
\end_inset

 and 
\begin_inset Formula $0.013$
\end_inset

.
 Again, the right-hand gap widens, not narrows, as the vector-length increases.
 The stddev does not provide a natural scale.
\end_layout

\begin_layout Standard
Anyway, this whole approach is just-plain wrong.
 Selecting random hypervectors gives us the Gaussian distribution in the
 limit, yet we're seeing a flat-topped distribution.
 That's because the MI was Gaussian; the flat-top is coming from that.
 In other words, for 
\begin_inset Formula $x$
\end_inset

 uniformly distributed in 
\begin_inset Formula $\left[-1,1\right]$
\end_inset

 we should be considering 
\begin_inset Formula $\left\langle x^{2}\right\rangle =2/3$
\end_inset

.
 The natural scale is 
\begin_inset Formula $\sqrt{\left\langle x^{2}\right\rangle }=\sqrt{2/3}=0.8165$
\end_inset

.
 Hmmm.
 What does this mean?
\end_layout

\begin_layout Standard
None of this is getting us any closer to 
\begin_inset Quotes eld
\end_inset

estimating the repulsion between vectors
\begin_inset Quotes erd
\end_inset

.
 Lets try again.
 Here's the same data, except this time, the x-axis is given by 
\begin_inset Formula $\theta=\arccos\left(\hat{w}\cdot\hat{u}\right)$
\end_inset

.
 For a perfect uniform distribution, we should see 
\begin_inset Formula $\theta$
\end_inset

 uniformly distributed over 
\begin_inset Formula $\left[0,\pi\right]$
\end_inset

.
 Instead, we get the below left:
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/theta-d6k.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/theta-dist-1k.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
So the repulsive region is for 
\begin_inset Formula $\theta\apprle0.5\approx\pi/6$
\end_inset

.
 The above right is the 
\begin_inset Formula $M=1000$
\end_inset

 curve only, after removing the diagonal entries (which all have a cosine
 of exactly 1.) The arccosine is taken before binning, and 200 bins are used
 instead of 100, so that we can get a better view of what is happening near
 
\begin_inset Formula $\theta\approx0$
\end_inset

.
\end_layout

\begin_layout Standard
There are few or no word-pairs for which 
\begin_inset Formula $\theta\gtrsim3\pi/4$
\end_inset

.
 This means that there aren't any words which are 
\begin_inset Quotes eld
\end_inset

truly grammatically dissimilar from one-another
\begin_inset Quotes erd
\end_inset

, but its unclear just how 
\begin_inset Quotes eld
\end_inset

deep
\begin_inset Quotes erd
\end_inset

 this insight is.
 Is this saying that the English grammar is not 
\begin_inset Quotes eld
\end_inset

random
\begin_inset Quotes erd
\end_inset

? That it's impossible to be 
\begin_inset Quotes eld
\end_inset

grammatically dissimilar
\begin_inset Quotes erd
\end_inset

 in general? The meaning of this is unclear.
 It does not appear to be important, though; whatever insight might be offered
 here, it does not seem worth the effort just right now.
\end_layout

\begin_layout Subsubsection*
Correlations
\end_layout

\begin_layout Standard
The scatter plot below left shows how the cosine-MI and the MI are correlated.
 The graphs for the other correlations (between cosine-RMI and MI, cosine-MI
 and RMI, and cosine-RMI vs RMI) look similar, with offsets and slightly
 different shapes.
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/scatter-goe-mi.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/scatter-goe-mi-rmi.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
Above right shows how cosine-MI and cosine-RMI are correlated.
 There seem to be striations.
 What are they?
\end_layout

\begin_layout Subsection*
Top most similar
\end_layout

\begin_layout Standard
Of the 250 words that were compared, here are two tables showing the most-simila
r word-vectors, and the cosine.
 The left table shows the top-20 most similar word-vectors.
 The right table is the list, between 100 and 120.
 Things look more marginal about 200 down – there are some good matches,
 e.g.
 
\begin_inset Quotes eld
\end_inset

head
\begin_inset Quotes erd
\end_inset

–
\begin_inset Quotes erd
\end_inset

face
\begin_inset Quotes erd
\end_inset

 and some mediocre ones: 
\begin_inset Quotes eld
\end_inset

people
\begin_inset Quotes erd
\end_inset

–
\begin_inset Quotes erd
\end_inset

life
\begin_inset Quotes erd
\end_inset

.
 Note that not very many words are being compared – 250 is not that big
 a vocabulary (Although the word-vectors themselves have dimension 
\begin_inset Formula $N=2500$
\end_inset

.)
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
top-20 word vectors
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9604
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9525
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
And
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
But
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9522
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9466
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9418
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9412
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9384
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
may
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
should
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9351
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9342
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9322
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9313
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
You
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
We
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9308
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
people
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
men
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9307
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
if
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9263
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
is
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
was
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9251
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
shall
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
should
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9250
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
with
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
by
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9245
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9233
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
him
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
them
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9215
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
had
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
would
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9214
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset space \qquad{}
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word vectors 100-120
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8834
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
by
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
after
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8834
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8833
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
and
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
but
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8832
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
after
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
who
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8826
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
who
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8823
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
what
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
which
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8823
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
night
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
day
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8815
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
And
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Then
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8807
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
had
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
could
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8806
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
would
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
could
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8805
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
had
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
has
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8804
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
What
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
A
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8790
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
eyes
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
hand
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8790
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
in
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
to
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8790
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
himself
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
away
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8789
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
with
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
from
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8788
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
eyes
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
head
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8785
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
though
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
if
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8782
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
This
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
What
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8777
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
eyes
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
face
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Below, as a reminder, is the top-20 list, on the same dataset, the same
 pairs, of the top-20, ranked by ranked-MI.
 Clearly a very different beast.
 There's lots of punctuation, because punctuation occurs very frequently,
 and thus gets ranked highly (recall, the ranking factor is the log of the
 frequency).
 Some of these are OK, such as 
\begin_inset Quotes eld
\end_inset

is
\begin_inset Quotes erd
\end_inset

–
\begin_inset Quotes erd
\end_inset

was
\begin_inset Quotes erd
\end_inset

 which has both high RMI and high cosine.
 But the abysmal cosine for the —, + and [, + pairs just says that the ranking
 factor is a better, more accurate measure of similarity.
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="4">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
top-20 by ranked-MI
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
RMI
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.3329
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.5244
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
....
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
–
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.1105
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.1864
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
—
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.7866
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.1428
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
;
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
,
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9263
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.0413
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
is
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
was
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8833
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.0177
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
and
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
but
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.7530
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.9218
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
.
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
?
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8053
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.9079
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
!
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
?
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6956
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.8475
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
It
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
He
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.2569
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.8309
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
[
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6577
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.2733
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
”
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
"
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.7111
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.1875
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
No
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
A
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.7734
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.1764
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
in
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
of
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9522
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.1732
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6513
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.1305
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
It
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
There
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8076
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.0921
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
and
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
as
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.7171
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.0558
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
!
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
.
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4163
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.0537
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"I
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"I
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8941
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.0221
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
‘
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
“
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4701
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.0113
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
–
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
'
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8035
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
7.9720
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
the
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
his
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Comparing these two, it suggests that the GOE similarity provides a much
 healthier concept of similarity.
\end_layout

\begin_layout Standard
However, the results are strongly dependent on word frequency.
 The tables above considered only the top-ranked 250 words.
 Here's the similarity for the top-ranked 400 and 700 words.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
top-20 out of M=400
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9773
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"I
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash
"I
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9662
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
7
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9634
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9604
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9525
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
And
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
But
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9522
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9498
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9483
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
7
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9466
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9464
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
father
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
mother
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9418
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9412
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9411
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9410
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9384
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
may
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
should
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9367
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
What
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Why
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9351
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9342
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9322
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9313
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
You
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
We
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset space \qquad{}
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
top-20 out of M=700
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9954
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Ag
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Ap
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9866
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
26
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
13
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9773
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"I
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash
"I
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9756
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"No
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Well
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9749
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Well
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Yes
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9734
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
11
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
29
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9732
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Well
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Oh
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9719
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
29
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9708
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
13
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9699
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Well
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Yes
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9690
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
11
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9662
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
7
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9658
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Yes
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Oh
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9640
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"No
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Oh
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9634
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9627
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"No
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Yes
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9604
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9577
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Yes
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash

\backslash
"Oh
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9575
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Ag
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Jl
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9563
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
26
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
It continues like this, the deeper we go.
 Conclude: there are lots of infrequent words that are even more similar
 than the frequent ones.
 All these matchup appear to be quite healthy, overall.
 No complaints.
\end_layout

\begin_layout Subsection*
Word analogy via vector additivity
\end_layout

\begin_layout Standard
The litmus test would be doing the word-vector sum 
\begin_inset Quotes eld
\end_inset

King-man+woman
\begin_inset Quotes erd
\end_inset

 to see what happens.
 I suspect the vocabulary is too small to do this, and that much larger
 datasets are needed.
 But lets try it anyway.
\end_layout

\begin_layout Standard
So 
\begin_inset Quotes eld
\end_inset

king-man+woman
\begin_inset Quotes erd
\end_inset

 fails, since apparently, 
\begin_inset Quotes eld
\end_inset

queen
\begin_inset Quotes erd
\end_inset

 is not well-represented e.g.
 king dot queen = 0.
\end_layout

\begin_layout Standard
Here are some that work:
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="10" columns="6">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
husband-man+woman
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
brother-man+woman
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
boy-man+woman
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dot
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dot
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dot
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
husband
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0892
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
brother
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0565
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
boy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0854
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wife
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0051
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wife
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9705
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
woman
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0126
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
woman
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9511
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
husband
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9625
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
lady
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9229
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
brother
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9298
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sister
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9519
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
girl
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9126
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
mother
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9237
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
mother
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9205
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
husband
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8872
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
king
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9220
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
friend
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9091
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
mother
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8805
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sister
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9192
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
woman
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9088
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wife
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8727
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
heart
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9070
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
father
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8781
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
boys
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8724
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
So that's not too bad.
 This is a medium-small language model, and it gives plausible results.
 The fact that base vectors (husband, man, woman,...) show at the top suggests
 that the language model is just not large enough to defnititvely knock
 these out.
\end_layout

\begin_layout Standard
Vector sums between Paris, France, Spain, Germany, England, Berlin etc all
 fail completely; get similarities to 
\begin_inset Quotes eld
\end_inset

O
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

E
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

W
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

Mr
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

Mrs
\begin_inset Quotes erd
\end_inset

, 
\begin_inset Quotes eld
\end_inset

6d
\begin_inset Quotes erd
\end_inset

 – so garbage.
 Presumable cause the sample texts are not talking about countries or capitals.
 Wikipedia could solve this!? 
\end_layout

\begin_layout Standard
Here are some more:
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="10" columns="4">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
black-white+up
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
short-light+long
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dot
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dot
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
up
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0496
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
long
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.0339
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
off
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9487
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
short
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9931
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
down
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9468
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
strong
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8728
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
away
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9384
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
hard
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8702
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
himself
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9256
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
quick
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8595
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
around
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9240
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
large
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8322
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
once
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9089
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
bright
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8172
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
herself
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8950
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
small
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8122
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
After some effort, I couldn't find any more.
 The sample size really is too small.
 Using RMI instead of MI does not substantially change results.
\end_layout

\begin_layout Section*
Vector Layers and Hypervectors
\end_layout

\begin_layout Standard
In the prior section, we took an 
\begin_inset Formula $N\times N$
\end_inset

 matrix 
\begin_inset Formula $F\left(u,w\right)$
\end_inset

 whose matrix elements were distributed approximately as a Gaussian, and
 rescaled it to obtain a new matrix 
\begin_inset Formula $G\left(u,w\right)$
\end_inset

, having the same distribution, but now having a mean of zero and an stddev
 of 1.
 Such Gaussians can be interpreted as vectors uniformly randomly distributed
 on an 
\begin_inset Formula $N-1$
\end_inset

 sphere.
\end_layout

\begin_layout Standard
The cosine products of these vectors give a new matrix 
\begin_inset Formula $F_{2}\left(u,w\right)$
\end_inset

.
 The matrix entries are given by the cosines:
\begin_inset Formula 
\[
F_{2}\left(u,w\right)=\frac{\sum_{v}G\left(u,v\right)G\left(w,v\right)}{\sqrt{G^{2}\left(u,*\right)G^{2}\left(v,*\right)}}
\]

\end_inset

where 
\begin_inset Formula 
\[
G^{2}\left(u,*\right)=\sum_{v}G^{2}\left(u,v\right)=\left\Vert G\left(u\right)\right\Vert _{2}^{2}
\]

\end_inset

The matrix elements 
\begin_inset Formula $F_{2}\left(u,w\right)$
\end_inset

 are now uniformly distributed on the unit cube 
\begin_inset Formula $\left[-1,1\right]^{N}$
\end_inset

 centered at the origin.
 They are no longer Gaussian.
 For high-dimensional 
\begin_inset Formula $N$
\end_inset

, this implies that the (row or column) vectors of 
\begin_inset Formula $F_{2}$
\end_inset

 are clustered along the diagonals.
 
\end_layout

\begin_layout Subsection*
Vector Layers
\end_layout

\begin_layout Standard
We can iterate on the construction process, and obtain 
\begin_inset Formula $G_{2}$
\end_inset

 and then 
\begin_inset Formula $F_{3}$
\end_inset

 and 
\begin_inset Formula $G_{3}$
\end_inset

 and then 
\begin_inset Formula $F_{4}$
\end_inset

 and so on.
 So, 
\begin_inset Quotes eld
\end_inset

layers
\begin_inset Quotes erd
\end_inset

.
 What are these layers? I asked this question at 
\begin_inset CommandInset href
LatexCommand href
name "MathOverflow"
target "https://mathoverflow.net/questions/431491/iterated-gaussian-ensembles-hypervectors"
literal "false"

\end_inset

.
\end_layout

\begin_layout Standard
Some algebra might shed light.
 Let 
\begin_inset Formula $B$
\end_inset

 be the matrix all of whose entries are one: i.e.
 
\begin_inset Formula $B\left(u,v\right)=1$
\end_inset

.
 Note that 
\begin_inset Formula $B^{2}=NB$
\end_inset

 so its idempotent up to a constant.
 Alternately, 
\begin_inset Formula $C\left(u,v\right)=1/N$
\end_inset

 so that 
\begin_inset Formula $C^{2}=C$
\end_inset

 is idempotent.
 Then, given 
\begin_inset Formula $F$
\end_inset

, we had 
\begin_inset Formula 
\[
G=\frac{1}{\sigma_{F}}\left(F-\mu_{F}B\right)
\]

\end_inset

where 
\begin_inset Formula $\mu_{F}$
\end_inset

 and 
\begin_inset Formula $\sigma_{F}$
\end_inset

 are scalars.
 Define a unitized matrix 
\begin_inset Formula $H$
\end_inset

 by
\begin_inset Formula 
\[
H\left(u,v\right)=\frac{G\left(u,v\right)}{\left\Vert G\left(u\right)\right\Vert _{2}}
\]

\end_inset

so that all of the row-vectors of 
\begin_inset Formula $H$
\end_inset

 are unit-length.
 Then
\begin_inset Formula 
\[
F_{2}=HH^{T}
\]

\end_inset

where 
\begin_inset Formula $H^{T}$
\end_inset

 is the transpose of 
\begin_inset Formula $H$
\end_inset

.
 Hmm.
 Plugging through provides no insight or simplification.
\end_layout

\begin_layout Standard
Note that 
\begin_inset Formula $BF$
\end_inset

 is a matrix whose rows are all identical, and the 
\begin_inset Formula $\ell_{1}$
\end_inset

 norm of each row is 
\begin_inset Formula $N\mu_{F}$
\end_inset

.
\end_layout

\begin_layout Subsection*
Hypervectors
\end_layout

\begin_layout Standard
For high-dimensional 
\begin_inset Formula $N$
\end_inset

, this implies that the (row or column) vectors of 
\begin_inset Formula $F_{2}$
\end_inset

 are naturally clustered along the diagonals.
 Because they already have this clustering, we can project onto a hypervector
 
\begin_inset Formula $J$
\end_inset

 in 
\begin_inset Formula $\left\{ -1,1\right\} ^{N}$
\end_inset

.
 Define 
\begin_inset Formula $J$
\end_inset

 as 
\begin_inset Formula 
\[
J\left(u,v\right)=\Theta\left(F_{2}\left(u,v\right)\right)
\]

\end_inset

where 
\begin_inset Formula 
\[
\Theta\left(x\right)=\begin{cases}
+1 & \mbox{for }0<x\\
-1 & \mbox{otherwise}
\end{cases}
\]

\end_inset

is the Heaviside step function.
 This provides an encoding of words as hypervectors.
 Is it useful for anything? How good is it?
\end_layout

\begin_layout Subsection*
\begin_inset Formula $F_{2}$
\end_inset

 Data
\end_layout

\begin_layout Standard
The mean and stddev for 
\begin_inset Formula $F_{1}$
\end_inset

 are 
\begin_inset Formula $\mu_{F}=0.09075$
\end_inset

 and 
\begin_inset Formula $\sigma_{F}=0.3389$
\end_inset

 – this is the mean and stddev for the top 1000 pairs 
\begin_inset Formula $\left(w,u\right)$
\end_inset

 in the graphs called 
\begin_inset Quotes eld
\end_inset

ortho cosine
\begin_inset Quotes erd
\end_inset

, up above, for 
\begin_inset Formula $N=2500$
\end_inset

 and 
\begin_inset Formula $M=1000$
\end_inset

.
\end_layout

\begin_layout Standard
Notation: Here, 
\begin_inset Formula $F\left(w,u\right)=F_{1}\left(w,u\right)=\hat{w}\cdot\hat{u}$
\end_inset

 where 
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset

 was called the 
\begin_inset Quotes eld
\end_inset

ortho cosine
\begin_inset Quotes erd
\end_inset

 above, and 
\begin_inset Formula $\hat{w}$
\end_inset

 is the normalized unit vector constructed from MI.
\end_layout

\begin_layout Standard
Below left is a graph of the actual distribution of 
\begin_inset Formula $F_{2}\left(w,u\right)$
\end_inset

 for 
\begin_inset Formula $N\left(N+1\right)/2=250\times251/2=31375$
\end_inset

 word pairs 
\begin_inset Formula $\left(w,u\right)$
\end_inset

.
 It appears to support the claim that vectors cluster along the diagonals,
 as witnessed by the two peaks at +1 and -1.
 The green line is an eyeballed fit, given by 
\begin_inset Formula $1.5x^{6}+0.3$
\end_inset

.
\end_layout

\begin_layout Standard
\align center
\begin_inset Graphics
	filename p8-goe/f2-dist-250.eps
	width 48text%

\end_inset


\begin_inset Graphics
	filename p8-goe/scatter-f2.eps
	width 48text%

\end_inset


\end_layout

\begin_layout Standard
Above right is the correlation between 
\begin_inset Formula $F_{1}$
\end_inset

 and 
\begin_inset Formula $F_{2}$
\end_inset

.
 It looks sinusoidal; the sine (shifted and compressed) seems to be a decent
 separatrix between the two lobes.
 I have no idea why it's correlated by that.
 It could have been anything, but it wasn't.
\end_layout

\begin_layout Standard
How about the top pairs? Two tables: the present 
\begin_inset Formula $F_{2}$
\end_inset

 on the left, and the older 
\begin_inset Formula $F_{1}$
\end_inset

 on the right:
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Top-20
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $F_{2}\left(u,v\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $u$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $v$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9997
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9993
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9991
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
And
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
But
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9990
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
may
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
has
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9985
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9984
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9982
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9982
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
if
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9981
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
who
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9981
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9980
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
shall
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
may
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9980
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
shall
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
must
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9980
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
who
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9980
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
if
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9979
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9978
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
were
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
are
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9978
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
night
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
day
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9978
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
would
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
must
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9977
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
who
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9977
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
shall
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
has
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset space \qquad{}
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="22" columns="3">
<features tabularvalignment="middle">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<column alignment="left" valignment="top" width="0pt">
<row>
<cell multicolumn="1" alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Top-20 word vectors
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="left" valignment="top" topline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\hat{w}\cdot\hat{u}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{w}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\vec{u}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9604
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9525
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
And
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
But
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9522
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9466
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9418
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
they
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9412
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
he
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9384
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
may
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
should
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9351
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9342
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9322
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
she
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
we
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9313
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
You
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
We
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9308
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
people
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
men
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9307
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
if
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9263
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
is
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
was
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9251
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
shall
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
should
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9250
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
with
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
by
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9245
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
how
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9233
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
him
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
them
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9215
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
had
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
would
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9214
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
when
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
where
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Several things to observe.
 First, the pairings are quite similar.
 Most are quite reasonable, except 
\begin_inset Formula $F_{2}$
\end_inset

 has the bizzaro (they, if) and (we, if) whereas 
\begin_inset Formula $F_{1}$
\end_inset

 has the quite reasonable (when, if).
 Things still look reasonable 100 or 200 words down, but mediocre at 1000
 and bad at 5000, even thought 
\begin_inset Formula $F_{2}=0.883$
\end_inset

 down at the 5000 level.
 So good 
\begin_inset Formula $F_{2}$
\end_inset

 is very compressed at the corners of the hypercube.
 At the other end, near 
\begin_inset Formula $F_{2}\approx-1$
\end_inset

, the pairings are 
\begin_inset Quotes eld
\end_inset

more than bad
\begin_inset Quotes erd
\end_inset

, they look random, randomly unrelated.
\end_layout

\begin_layout Standard
Conclusion: 
\begin_inset Formula $F_{2}$
\end_inset

 takes considerably more CPU resources to compute, and does not appear to
 offer any benefits.
 It's interesting, it's curious how it correlates with 
\begin_inset Formula $F_{1}$
\end_inset

 and has the potential of being turned into a hypervector ...
 but ...
 it does not seem to be useful in practice.
\end_layout

\begin_layout Section*
The End
\end_layout

\begin_layout Standard
This is the end of Part Eight of the diary.
 
\end_layout

\end_body
\end_document