diss.tex

% Template for a Computer Science Tripos Part II project dissertation
\documentclass[12pt,a4paper,twoside,openright]{report}
\usepackage[hyphens]{url}
\usepackage[pdfborder={0 0 0}]{hyperref}    % turns references into hyperlinks
\usepackage[nameinlink]{cleveref}
\crefname{algorithm}{Algorithm}{Algorithms}
\crefname{figure}{Figure}{Figures}
\crefname{listing}{Listing}{Listings}
\crefname{section}{Section}{Sections}
\crefname{table}{Table}{Tables}


\usepackage[margin=25mm]{geometry}  % adjusts page layout
\usepackage{graphicx}  % allows inclusion of PDF, PNG and JPG images
\usepackage{verbatim}
\usepackage{docmute}   % only needed to allow inclusion of proposal.tex
\usepackage{todonotes}
\usepackage{algpseudocode}
\usepackage{algorithm}
\usepackage{tabularx}
\usepackage[numbers]{natbib}
\usepackage{tikz}
\usepackage{caption}
\usepackage{subcaption}
\captionsetup[figure]{labelfont={bf},textfont=it}
\usetikzlibrary{trees,calc,matrix,arrows,decorations.pathreplacing}
\usepackage{gnuplot-lua-tikz}

\tikzset{>=stealth'}

%%%%%%% Using Minted Package for Code Listings %%%%%%%
\usepackage{minted}
\usemintedstyle{colorful}

\raggedbottom                           % try to avoid widows and orphans
\sloppy
\clubpenalty1000%
\widowpenalty1000%

\renewcommand{\baselinestretch}{1.1}    % adjust line spacing to make
                                        % more readable

\newcommand{\mytodo}{\todo[inline, color=green!40]}

\begin{document}

\bibliographystyle{plainnat}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title


\pagestyle{empty}

\rightline{\LARGE \textbf{Rupert Horlick}}

\vspace*{60mm}
\begin{center}
\Huge
\textbf{Encrypted Keyword Search Using Path ORAM on MirageOS} \\[5mm]
Computer Science Tripos -- Part II \\[5mm]
Homerton College \\[5mm]
\today  % today's date
\end{center}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Proforma, table of contents and list of figures

\pagestyle{plain}

\chapter*{Proforma}

{\large
\begin{tabular}{ll}
Name:               & \bf Rupert Horlick                       \\
College:            & \bf Homerton College                     \\
Project Title:      & \bf Encrypted Keyword Search Using \\
& \bf Path ORAM on MirageOS \\
Examination:        & \bf Computer Science Tripos -- Part II, July 2016  \\
Word Count:         & \bf 10595\footnotemark[1] \\
Project Originator: & Dr Nik Sultana                    \\
Supervisors:         & Dr Nik Sultana \& Dr Richard Mortier                    \\
\end{tabular}
}
\footnotetext[1]{This word count was computed
by \texttt{detex diss.tex | tr -cd '0-9A-Za-z $\tt\backslash$n' | wc -w}
}
\stepcounter{footnote}

\section*{Original Aims of the Project}

% Give a 100 word summary of what was to be achieved by the project, i.e. secure searchable encrypted documents

The main aim of this project was to implement the recent Path ORAM cryptographic protocol, which hides access patterns to underlying storage devices. This provides a method of implementing secure encrypted keyword search, which is a motivating application of ORAM. The second aim was therefore to build a file system with keyword search on top of ORAM. Another significant aim of this project was to evaluate ORAM and compare my results with theoretical and empirical bounds presented in the literature.

\section*{Work Completed}

I implemented Path ORAM, and added extensions of recursion and statelessness, making it possible to disconnect from and reconnect to ORAM. On top of this, I built an inode-based file system, which included the implementation of a B-Tree library. Finally, I built a search module that uses an inverted index to perform search over the documents in the file system. I evaluated the implementation, testing its functionality, performance, and statistical security.

\section*{Special Difficulties}

The evaluation of ORAM presented difficulty because of the extensive set-up required. Each ORAM block device takes several days to initialise, depending on its size, and performing experiments on the device takes at least another day. The evaluation process thus took longer than expected, since I had to test ORAMs of varying sizes and sets of features in this way.

\newpage
\section*{Declaration}

I, Rupert Horlick of Homerton College, being a candidate for Part II of the Computer
Science Tripos, hereby declare
that this dissertation and the work described in it are my own work,
unaided except as may be specified below, and that the dissertation
does not contain material that has already been used to any substantial
extent for a comparable purpose.

\bigskip
\leftline{Signed}

\medskip
\leftline{Date}

\tableofcontents

\listoffigures

%\todototoc
%\listoftodos

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% now for the chapters

\pagestyle{headings}

\chapter{Introduction}

\section{Motivation}

With cloud storage fast becoming ubiquitous, providers are faced with the challenge of guaranteeing the security of their clients' data. More than an exabyte of data was delivered by cloud storage providers in 2013 \cite{nasuni2013cloud}, and since so much of this data is held by only a handful of providers, trust is becoming a major concern.

Encryption might appear to be the solution to these trust issues; surely if the providers cannot read the plain-text of data then it must be secure. This seems to hold in general, but, in the application of query-based search, there is a problem. \citet{islam2012access} demonstrated that using current methods of homomorphic encryption to search over encrypted documents can leak up to 80\% of queries. Knowledge of the queries made to a data set, along with the number of documents returned by each query, could lead to dangerous inferences. As a motivating example, the discovery that a query to a medical database, such as $\langle name,~disease\rangle$, returned results might allow an adversary to deduce information about a patient's medical status, constituting a breach of patient confidentiality.

\citet{islam2012access} were able to infer search queries using the access pattern, the set of documents returned by each query. Thus, in order to protect against this kind of attack, we need to prevent the server from knowing which documents it returns in response to a query. Oblivious Random-Access Memory (ORAM) provides exactly that. Using ORAM, two accesses to the same piece of data, and, moreover, any two access patterns of the same length, are computationally indistinguishable to the server.

This project aims to demonstrate that, using \citeauthor{stefanov2013path}'s Path ORAM protocol \cite{stefanov2013path}, it is possible to build a system that searches over encrypted documents without leaking the resulting access pattern, protecting the content of the search queries and, therefore, the confidentiality of the documents.

\section{Challenges}

When dealing with security, the first challenge is to precisely define the threat model. The assumed capabilities of all parties must be clearly stated to ensure that security proofs are built on a solid foundation. The threat model for this project was refined a number of times, following the discovery of hidden assumptions, and is defined in \cref{sec:threatmodel}.

Another challenge is taking a complex, abstract protocol, and making the design decisions required to turn it into a working system. The Path ORAM protocol abstracts away many implementation details, which had to be realised in this project.

Adding recursion to ORAM reduces the amount of client-side storage, making stateless ORAM more efficient. This presents us with the challenge of building recursive data structures, which are difficult to reason about and debug. This project takes advantage of OCaml's powerful module system, which separates the challenge of recursion from the underlying implementation.

The final challenge is choosing how to evaluate ORAM. There are many parameters and metrics that we could examine, however, due to the time consuming nature of running experiments on ORAM, this project limits its focus to the time overheads incurred by ORAM and the security of its construction.

\section{Related Work}

ORAM was first introduced by \citet{goldreich1996software} in \citeyear{goldreich1996software}, who were motivated by the need for software protection on disks. The model was then expanded and refined for other settings, such as secure processors and cloud computing \cite{shi2011oblivious}. \citet{stefanov2013path} made a significant contribution to the field, due to the simplicity and elegance of their protocol. Since then, many optimisations and additional features have been developed \cite{yu2014enhancing,ren2015constants,moataz2015resizable}, and used to build a working, cloud-based storage system \cite{stefanov2013oblivistore}. A useful thesis by \citet{teeuwen2015evolution} summarises the entire ORAM field, providing valuable insight into the evolution of ORAM.

\chapter{Preparation}

\section{Threat Model}
\label{sec:threatmodel}

This threat model involves three principals: the client, the server, and the attacker. The network is assumed to be under the control of the attacker. The attacker and the server are defined to be passive, and honest, but curious; they will both gather as much information as possible, without deviating from the protocol. Thus, the attacker will eavesdrop, but will neither prevent transmissions between the client and the server, nor tamper with, or produce their own, messages. The server will not tamper with the underlying storage. In this model, the attacker can be dealt with by the use of encryption, so it is the server that presents a threat, because it can see the access pattern of the underlying storage. The goal of this project is to ensure that this access pattern does not leak information from the search queries nor from the documents in the underlying storage.

%\setlength{\unitlength}{0.67mm}
%\input{threatModel}
%\setlength{\unitlength}{0.5mm}

\begin{figure}
    \centering
    \begin{tikzpicture}[>=stealth']
        \matrix (m) [matrix of nodes,row sep=8mm,column sep=5mm,nodes={minimum width=35mm,minimum height=20mm,inner sep=0}] {
          &
          |[draw]| Attacker &
          \\
          |[draw]| Client &
          |[minimum size=0]| &
          |[draw]| Server \\
        };
        \draw[->] ($(m-2-2) + (0,1.5mm)$) -- (m-1-2);
        \draw[<->] (m-2-1) -- (m-2-3);
        \node at ($(m-1-2) - (0,34mm)$) {Read, Write, Search};
    \end{tikzpicture}
    \caption{The threat model: the attacker and server are passive, and honest, but curious.}
    \label{fig:threatmodel}
\end{figure}

\section{Introduction to Path ORAM}
\label{sec:oramintro}

The Path ORAM protocol is defined in terms of a client and a server, where the client stores data on the server. The data is divided into \emph{blocks}, each of which is tagged with its offset in the data. We call this the block's \emph{address}.

On the server, blocks are stored in a binary tree of height $L$. Each node in the tree is a \emph{bucket} of size $Z$, that may hold up to $Z$ real blocks. Buckets must always be full, so dummy blocks are stored when there are fewer than $Z$ real blocks. Thus, the tree stores $$N = Z \cdot (2^{L+1} - 1)$$ blocks in total. In this project, $N$ and $Z$ are specified by the user, and $L$ is calculated from them.

The client stores two local data structures: the \emph{stash} and the \emph{position map}. The stash is a sort of working memory. As blocks are read from the server, they are written into the stash and may be returned to the server later on. Initially the stash is empty. The \emph{position map} associates each address with a position between $0$ and $2^L-1$, which corresponds to a leaf, and therefore a path, in the tree. The position map is initialised by assigning a random position to each address.

The protocol maintains the invariant that, after the client performs a read or a write, a block with position $x$ is either in the stash, or in some bucket along the path to leaf $x$. This is achieved by using \cref{alg:access} for both read and write operations. Its execution can be divided into four steps:

\begin{enumerate}
    \item \textbf{Remap block}. The current position $x$ of the block with address $\mathsf{a}$ is read from the position map. A new position is chosen uniformly at random from $\{0,\dots,2^L-1\}$ and is added to the position map.
    \item \textbf{Read path}. The path to leaf $x$ is read into the stash. The block with address $\mathsf{a}$ will now be in the stash, if it has ever been written into ORAM.
    \item \textbf{Write new data}. If the operation is a $\mathsf{write}$, the current block with address $\mathsf{a}$ is removed from the stash and is replaced by the new block containing $\mathsf{data^\ast}$.
    \item \textbf{Write path}. The path to leaf $x$ is filled with blocks from the stash that meet the following condition. A block with address $\mathsf{a'}$ can be written into the bucket at level $l$ if the path to leaf $\mathsf{position[a']}$ follows the path to leaf $x$ down to level $l$. If the number of blocks that satisfy this criterion is less than the bucket size, then the remainder of the bucket is filled with dummy blocks.
\end{enumerate}

If a block in the stash is written back into the tree, then it must be in some bucket along the path to its assigned position. If not, then it is still in the stash, so the invariant holds after each execution of the algorithm.

\begin{algorithm}[h]
\caption{Read/write data block with address $\mathsf{a}$}
\label{alg:access}
\footnotesize
\begin{algorithmic}[1]
    \vskip 10pt
    \Function{Access}{$\mathsf{op,a,data^*}$}
    \vskip 10pt
    \State $x \gets \mathsf{position[a]}$
    \State $\mathsf{position[a]} \gets$ \Call{UniformRandom}{$2^L-1$}
    \vskip 10pt
    \For{$l \in \{0,1,\dots,L\}$}
        \State $S \gets S~\cup$ \Call{ReadBucket}{$\mathcal{P}(x,l)$}
    \EndFor
    \vskip 10pt
    \State $\mathsf{data} \gets$ Read block $\mathsf{a}$ from $S$
    \If{$\mathsf{op} = \mathsf{write}$}
        \State $S \gets (S - \{(\mathsf{a,data})\}) \cup \{(\mathsf{a,data^*})\}$
    \EndIf
    \vskip 10pt
    \For{$l \in \{L,L-1,\dots,0\}$}
        \State $S' \gets \{(\mathsf{a',data'}) \in S : \mathcal{P}(x,l) = \mathcal{P}(\mathsf{position[a']},l)\}$
        \State $S' \gets$ Select $\min(|S'|,Z)$ blocks from $S'$
        \State $S \gets S - S'$
        \State \Call{WriteBucket}{$\mathcal{P}(x,l),S'$}
    \EndFor
    \vskip 10pt
    \State \Return $\mathsf{data}$
    \vskip 10pt
    \EndFunction
    \vskip 10pt
\end{algorithmic}
\end{algorithm}

The security of this algorithm is based on the random assignment of positions in step 1. If we consider the sequence of $M$ accesses, $$\mathbf{p} = (\mathsf{position}_M[\mathsf{a}_M], \mathsf{position}_{M-1}[\mathsf{a}_{M-1}], \dots, \mathsf{position}_1[\mathsf{a}_1]),$$ any two accesses to the same address will be statistically independent, as will two accesses to different addresses. Thus, by an application of Bayes' rule, we have $$\Pr(\mathbf{p}) = \prod\limits^{M}_{j=1}\Pr(\mathsf{position}_j[\mathsf{a}_j]) = \left(\frac{1}{2^L}\right)^M$$ i.e. the probability of the whole sequence is equal to the product of the individual probabilities, and therefore the access pattern is indistinguishable from a random sequence of bit strings.

To enable the client to disconnect from ORAM and reconnect later on, perhaps from a different machine, there must be no persistent client-side state. We call this \emph{stateless} ORAM. This can be achieved by flushing the state, i.e. the stash and the position map, to disk after every operation. However, in the current model, the stash occupies $O(\log N)$ space, and the position map, $O(N)$, making this infeasible in practice.

\emph{Recursive} ORAM reduces the space required by the position map to $O(1)$, taking the overall client-side state to $O(\log N)$. This means that statelessness no longer increases the asymptotic bandwidth overhead. Recursion stores the position map of the original ORAM, now referred to as ORAM$_0$, in another ORAM, ORAM$_1$. Thus, the client-side state becomes the position map of ORAM$_1$, along with the stashes of both ORAM$_0$ and ORAM$_1$. If each block in ORAM$_1$ can store $\chi$ positions then it will need $N' = N/\chi$ blocks. Therefore, the position map of ORAM$_1$ occupies $O(N/\chi)$ space. Repeating this recursion $\log N / \log \chi$ times leads to a position map of size $O(1)$. This sacrifices the space occupied by the recursive ORAMs and the time needed to perform recursive accesses, for efficient statelessness.

\section{Introduction to Inverted Indexes}
\label{sec:invertedindexintro}

The \emph{inverted index} is the most important data structure in Information Retrieval. The amount of time required to perform a query is reduced by performing a large amount of work in advance. A simple linear scan of $N$ documents takes $O(N)$ time. Using an inverted index also takes $O(N)$ time in the worst case, but the constant is greatly reduced and empirically the index delivers excellent performance.

The index consists of two parts: the \emph{dictionary} and the \emph{postings}. The dictionary is a list, usually stored as a hash table, of all of the terms that appear in a set of documents. Each term has a postings list, a list of all the documents that contain that term. Collectively these postings lists are referred to as the postings.

An inverted index can be constructed in three steps:
\begin{enumerate}
    \item Split each document into \emph{tokens}. These are units of the document separated by spaces.
    \item Perform linguistic preprocessing on the tokens. An example is \emph{stemming}, which removes suffixes of words, converting them into a normalised form.
    \item Assuming each document has an ID, add the ID to the postings list of each token the document contains.
\end{enumerate}

Look-up of a single keyword in a hash-based inverted index is performed by hashing the keyword and returning the relevant postings list, if it exists. Simple Boolean operations can be added. For instance, disjunction takes the union of two postings lists, and conjunction takes the intersection.

This project will limit its focus to conjunctive queries, as it aims to demonstrate the correctness and efficiency of search using the ORAM implementation, rather than creating an advanced IR system. Queries are space-separated lists of keywords, and the result of a query is the conjunction of the postings lists of all keywords it contains.

\begin{figure}
    \centering
    \begin{tikzpicture}
        \matrix (m) [matrix of nodes,nodes={draw},column sep=8mm,row sep=4mm] {
          Term1 & Doc1 & Doc2 & Doc3 & |[draw=none]| $\cdots$\phantom{T} \\
          Term2 & Doc3 & Doc6 & Doc7 & |[draw=none]| $\cdots$\phantom{T} \\
          Term3 & Doc4 & Doc8 & Doc9 & |[draw=none]| $\cdots$\phantom{T} \\
        };
        \draw[->] (m-1-1) -- (m-1-2);
        \draw[->] (m-1-2) -- (m-1-3);
        \draw[->] (m-1-3) -- (m-1-4);
        \draw[->] (m-1-4) -- (m-1-5);
        \draw[->] (m-2-1) -- (m-2-2);
        \draw[->] (m-2-2) -- (m-2-3);
        \draw[->] (m-2-3) -- (m-2-4);
        \draw[->] (m-2-4) -- (m-2-5);
        \draw[->] (m-3-1) -- (m-3-2);
        \draw[->] (m-3-2) -- (m-3-3);
        \draw[->] (m-3-3) -- (m-3-4);
        \draw[->] (m-3-4) -- (m-3-5);
        \draw[decorate,decoration={brace,mirror,raise=4pt}] (m-3-1.south west) -- node[midway,below,yshift=-7pt] {\footnotesize Dictionary} (m-3-1.south east);
        \draw[decorate,decoration={brace,mirror,raise=4pt}] (m-3-2.south west) -- node[midway,below,yshift=-7pt] {\footnotesize Postings} ($(m-3-5.south east) - (1em,0)$);
    \end{tikzpicture}
    \caption{The structure of an inverted index: the dictionary contains terms and the postings contains lists of documents that each term appears in.}
    \label{fig:invertedIndex}
\end{figure}

\section{Introduction to MirageOS}

Running ORAM in the cloud allows a user to access their data from any location. The ORAM client is a trusted cloud instance, and the server is a cloud storage provider, as illustrated in \cref{fig:cloudInstance}.

\begin{figure}
    \centering
    \tikzstyle{principal}=[draw,node distance=50mm,minimum width=30mm,minimum height=25mm,text width=20mm,text centered]
    \begin{tikzpicture}
        \node[principal] (user) {User};
        \node[principal,right of=user] (ci) {Cloud Instance};
        \node[principal,right of=ci] (csp) {Cloud Storage Provider};
        \draw[<->] (user) -- (ci);
        \draw[<->] (ci) -- (csp);
        \draw[decorate,decoration={brace,mirror,raise=4pt}] (ci.south west) -- (csp.south east) node[midway,below,yshift=-4mm] {Path ORAM Protocol};
    \end{tikzpicture}
    \caption{ORAM can be built as a MirageOS application, which can run on a trusted cloud instance.}
    \label{fig:cloudInstance}
\end{figure}

MirageOS is a unikernel operating system. In other words, a MirageOS application is compiled to an executable, along with only the necessary parts of the OS. It can be compiled for a number of targets, including Unix or Xen. An executable, running directly on the Xen hypervisor in the cloud, is more lightweight than the traditional cloud stack, which runs an application on a full operating system, such as Ubuntu. By building ORAM as a MirageOS application, this project allows an instance running ORAM to be spun up whenever the user needs, and shut down when not in use, minimising its cost.

\section{System Architecture}

This project uses the framework illustrated in \cref{fig:cloudInstance}. The focus is on implementing the code for the cloud instance, which consists of the implementation of Path ORAM, a file system running on top of it, and a search module that presents an API to the user. Writing interfaces for specific cloud storage providers is left as future work, so for the purposes of this project a local block device is used for storage.

MirageOS provides an interface for a block device, \texttt{BLOCK}. The ORAM module is designed to satisfy this interface, so that it can be inserted into existing Mirage applications. It is also designed to run on any underlying storage that satisfies \texttt{BLOCK} and, thus, interfaces for cloud storage providers could be written to be compatible with ORAM. The encryption module is designed in the same way, so it can be inserted between ORAM and the underlying storage. \Cref{fig:miragestack} shows the overall structure of the application.

\setlength{\unitlength}{0.75mm}
\input{mirageStackBody}
\setlength{\unitlength}{0.5mm}

\section{Requirements Analysis}

\begin{description}
    \item [High Priority] Basic ORAM
    \item [Medium Priority] File system, Search module
    \item [Low Priority] Encryption, Statelessness (Extension), Recursion (Extension)
\end{description}

Building ORAM is the core focus of the project and as such is given high priority. The addition of the file system and search module creates a complete system that can be evaluated, so these two modules are of medium priority. Encryption is necessary in a real world system, but not to perform evaluation, so it is given low priority. Recursion and stateless are extensions, and are therefore deemed to have low priority.

\section{Choice of Tools}

\subsection{OCaml}

OCaml is the logical choice of programming language when building for MirageOS, because Mirage applications and libraries all use OCaml. However, OCaml's module and type systems were also considerations in the choice of Mirage for this project.

OCaml has a powerful module system. It allows one to build \emph{functors}, which are modules parametrised over module interfaces. Any module that satisfies the interface can be given to the functor to create a new concrete module. ORAM can therefore be implemented as a functor that is parametrised over the \texttt{BLOCK} interface.

OCaml's static typing system is an indispensable tool for ensuring the correctness of programs and for increasing productivity.

I had not used OCaml before beginning this project, so learning OCaml and many of its important libraries took a considerable amount of work at the start of the project, and as new libraries were required.

\subsection{Libraries}
\label{subsec:libraries}

The libraries used for building ORAM are listed in \cref{tab:libraries}.

\begin{table}[h]
\centering
\begin{tabularx}{\textwidth}{|l|l|X|l|}
\hline
\textit{Library} & \textit{Version} & \textit{Purpose} & \textit{License} \\
\hline \hline
Mirage & 2.6.1 & System Component Interface Definitions, Application Configuration Framework & ISC \\
\hline
Jane Street's Core & 113.00.00 & Data Structures, Algorithms & Apache-2.0 \\
\hline
LWT & 2.5.1 & Threading & LGPL-2.1 \\
\hline
Cstruct & 1.8.0 & Data Structure & ISC \\
\hline
Alcotest & 0.4.6 & Unit Testing & ISC \\
\hline
Mirage Block CCM & 1.0.0 & Encryption & ISC \\
\hline
Stemmer & 0.2 & Linguistic Processing & GNUv2 \\
\hline
\end{tabularx}
\caption{Libraries used by Mirage ORAM}
\label{tab:libraries}
\end{table}

\subsection{Development Environment}

Choosing the right tools is essential to the productivity of large projects. For OCaml, one of the most important tools is OASIS, which automatically generates Makefiles for a project, based on a specification file. Direct use of the OCaml compiler quickly becomes infeasible when linking together a large number of modules. OASIS deals with this automatically.
Emacs proved incredibly useful, because code indentation, syntax highlighting, autocompletion, and type inspection are all provided by third-party plug-ins. Code can be interpreted, compiled, and run from a shell within Emacs, which boosts productivity.

\begin{table}[t]
    \centering
    \begin{tabular}{|l|l|l|l|}
      \hline
      \textit{Tool} & \textit{Version} & \textit{Purpose} & \textit{License} \\
      \hline \hline
      Mac OSX & 10.11.2 & Operating System & Proprietary \\
      \hline
      Emacs & 24.5 & Text Editor & GPL \\
      \hline
      git & 2.8.0 & Version Control & GPLv2 \\
      \hline
      OPAM & 1.2.2 & Package Manager & GPLv3 \\
      \hline
      OASIS & 0.4.5 & Build Tool & LGPL-2.1 \\
      \hline
    \end{tabular}
    \caption{Tools used in the development of Mirage ORAM}
    \label{tab:devtools}
\end{table}

\section{Software Engineering Techniques}

I employed two key techniques to ensure my code was well-designed and well-built, without compromising productivity.

Firstly, I wrote the interface file for each module before writing the implementation. This forced me to make important design decisions up front, clarifying the structure of the module and its relation to the system as a whole.

Secondly, I practised Test Driven Development \cite{hunt2004pragmatic}. I unit tested each new piece of code as it was written, allowing me to fail fast and fix problems at their source. This approach meant that small modules could be integrated into a larger system that worked as expected more often than not.

Combining these techniques with documentation and structuring of both the source code and the source repository led to a manageable and productive development workflow.

\section{Summary}

This chapter has covered the work undertaken prior to development. This included a definition of the threat model, a brief introduction to the major algorithms, data structures and libraries, an overview of the preliminary architectural design, and a discussion of the techniques and tools selected for the development process.

The next chapter demonstrates how this preparatory material was applied to successfully implement the Path ORAM protocol on MirageOS, and how this protocol was used to build a secure encrypted keyword search application.

\chapter{Implementation}

This chapter explains the process of building a functioning system, using the designs and algorithms of the previous chapter. An overview diagram of the system is given in \cref{fig:systemOverview}. Each module will be examined in turn, working upwards through the diagram. Thus, the chapter begins with a discussion of encryption in \cref{sec:encryption}, followed by a longer discussion of ORAM in \cref{sec:pathORAM}, which constitutes the main focus of the project. This includes subsections about the extensions: recursion and statelessness. Finally, the file system and search modules are explored in \cref{sec:fileSystem,sec:searchmodule} respectively.

The main challenges and achievements of the implementation can be summarised with reference to \cref{fig:systemOverview}.

At the inter-modular level, integration of all parts of the system required careful API design and intricate manipulation of the OCaml module system.

For encryption, an appropriate library had to be chosen, a process that involved filing a pull request to fix a critical bug.

The ORAM module presented the challenge of translating the terse pseudocode of the Path ORAM protocol into a functioning program. This included designing a position map capable of operating on machines of any word size, and building functions to marshal data to and from the format required by ORAM. Adding recursion to this implementation warranted a deeper understanding of the module system, including first class and recursive modules. To achieve statelessness I had to serialise recursive data structures, which meant writing custom functions to conform with a binary protocol.

The implementation of a minimal, but complete, inode-based file system required investigation into the choices made by many file system designers before me. I weighed these against the demands of my system and included only the necessary elements. Furthermore, my file system entailed an implementation of B-Trees, which did not previously exist in OCaml. I therefore implemented a B-Tree library myself that can be applied to other Mirage applications, representing a contribution to the community.

Finally, the search module made use of concepts from the field of Information Retrieval, including algorithms and data structures. The decision to include stemming represented a significant trade-off between the space used by the indexing process and its precision.

\begin{figure}
    \centering
    \tikzstyle{function}=[draw=blue!50,thick,fill=blue!20,minimum width=30mm,minimum height=10mm]
    \tikzstyle{function3high}=[function,minimum height=42.3mm]
    \tikzstyle{function2wide}=[function,minimum width=70mm]
    \tikzstyle{nofunction}=[minimum width=30mm,minimum height=10mm]
    \tikzstyle{data}=[function,draw=red!50,fill=red!20]
    \tikzstyle{module}=[draw,inner sep=0,row sep=6mm,column sep=10mm,ampersand replacement=\&]
    \scalebox{0.9}{
    \begin{tikzpicture}[thick]
        \matrix (search) [module] {
          \node[nofunction] (m00) {Read File}; \&
          \node[nofunction] (m01) {Write File}; \&
          \node[function] (m02) {Search}; \\
          \node[nofunction] (m10) {Read File}; \&
          \node[nofunction] (m11) {Index}; \&
          \node[data] (m12) {Index}; \\
          \node[nofunction] (m20) {Read File}; \&
          \node[nofunction] (m21) {Write File}; \&
          \node[function] (m22) {Flush}; \\
        };
        \node[font=\large] at ($(search) - (75mm,0)$) {Search};
        \node[function3high] (readfile) at ($(m10) + (0.15mm,0)$) {Read File};
        \node[function3high] (writefile) at ($(m11) + (0.15mm,0)$) {Write File};
        \matrix (fs) at ($(search) - (0,55mm)$) [module] {
          \node[nofunction] (m30) {Read File}; \&
          \node[nofunction] (m31) {Write File}; \&
          \node[data] (m32) {Free Map}; \\
          \node[nofunction] (m40) {Read File}; \&
          \node[nofunction] (m41) {Write File}; \&
          \node[data] (m42) {Inode Index}; \\
          \node[nofunction] (m50) {Read}; \&
          \node[nofunction] (m51) {Write}; \&
          \node[function] (m52) {Flush}; \\
        };
        \node[font=\large] at ($(fs) - (75mm,0)$) {File System};
        \node[function3high] at ($(m40) + (0.15mm,0)$) {Read File};
        \node[function3high] at ($(m41) + (0.15mm,0)$) {Write File};
        \matrix (oram) at ($(fs) - (0,55mm)$) [module] {
          \node[function] (m60) {Read}; \&
          \node[function] (m61) {Write}; \&
          \node[data] (m62) {Position Map}; \\
          \node[nofunction] (m70) {Access}; \&
          \node[nofunction] (m71) {Access}; \&
          \node[data] (m72) {Stash}; \\
          \node[function] (m80) {Read Bucket}; \&
          \node[function] (m81) {Write Bucket}; \&
          \node[function] (m82) {Flush}; \\
        };
        \node[font=\large] at ($(oram) - (75mm,0)$) {ORAM};
        \node[function2wide] (funcaccess) at ($(m70) + (20.15mm,0)$) {Access};
        \matrix (enc) at ($(oram) - (0,55mm)$) [module] {
          \node[function] (m90) {Read}; \&
          \node[function] (m91) {Write}; \\
          \node[nofunction] (mA0) {Encrypt}; \&
          \node[nofunction] (mA1) {Encrypt}; \\
          \node[function] (mB0) {Read}; \&
          \node[function] (mB1) {Write}; \\
        };
        \node[font=\large] at ($(enc) - (75mm,0)$) {Encryption};
        \node[function2wide] (funcenc) at ($(mA0) + (20.15mm,0)$) {Encrypt};
        \matrix (disk) at ($(enc) - (0,40mm)$) [module] {
          \node[function] (mC0) {Read}; \&
          \node[function] (mC1) {Write}; \\
        };
        \node[font=\large] at ($(disk) - (75mm,0)$) {Disk};
        \draw[->] ($(m01) + (0,15mm)$) -- (m01);
        \draw[->] (m11) -- (m12);
        \draw[->] (m21) -- (m31);
        \draw[->] (m51) -- (m61);
        \draw[->] (m61) -- (funcaccess);
        \draw[->] (funcaccess) -- (m81);
        \draw[->] (m81) -- (m91);
        \draw[->] (m91) -- (funcenc);
        \draw[->] (funcenc) -- (mB1);
        \draw[->] (mB1) -- (mC1);
        \draw[->] (mC0) -- (mB0);
        \draw[->] (mB0) -- (funcenc);
        \draw[->] (funcenc) -- (m90);
        \draw[->] (m90) -- (m80);
        \draw[->] (m80) -- (funcaccess);
        \draw[->] (funcaccess) -- (m60);
        \draw[->] (m60) -- (m50);
        \draw[->] (m30) -- (m20);
        \draw[->] (m00) -- ($(m00) + (0,15mm)$);
        \draw[->] (m02) -- ($(m02) + (0,15mm)$);
        \draw[->] (m12) -- (m02);
        \draw[->] (m12) -- (m22);
        \draw[->] (m22) -- (m31);
        \draw[<->] ($(m41.east) + (0,2mm)$) -- (m32);
        \draw[<->] (m41) -- (m42);
        \draw[->] (m32) to [bend left=90] (m52);
        \draw[->] (m42) -- (m52);
        \draw[->] (m52) -- (m61);
        \draw[<->] (funcaccess) -- (m62);
        \draw[<->] (funcaccess) -- (m72);
        \draw[->] (m62) to [bend left=90] (m82);
        \draw[->] (m72) -- (m82);
        \draw[->] (m82) -- (m91);
    \end{tikzpicture}}
    \caption{An overview of the system, showing the flow of data. Black boxes are modules, blue are functions, and red are data structures.}
    \label{fig:systemOverview}
\end{figure}

\section{Encryption}
\label{sec:encryption}

ORAM's security depends on the security of the underlying encryption layer. Therefore, it is safer to use a trusted cryptographic library for this task, rather than implementing encryption directly.

Conveniently, the OCaml library Mirage Block CCM creates encrypted block devices that satisfy Mirage's \texttt{BLOCK} interface. It provides a functor, which is a module parametrised over a module interface. This functor takes a module that satisfies the \texttt{BLOCK} interface, and returns a new module, which satisfies the same interface but now uses encryption. ORAM is implemented in the same way, allowing the functors to be chained together to make an encrypted ORAM module.

To create the encrypted block device, and later connect to it, a key must be supplied. While key management would need to be dealt with properly in a real-world system, it is left out of the scope of this project. For the purposes of this project, a constant, known key is used throughout the evaluation.

\begin{listing}[t]
\caption{MirageOS's \texttt{BLOCK} module signature}
\label{lst:blocksig}
\vskip 10pt
\begin{minted}[fontsize=\scriptsize,breaklines]{ocaml}
module type BLOCK = sig

  type page_aligned_buffer = Cstruct.t

  type +'a io = 'a Lwt.t

  type t

  type error = [
    | `Unknown of string (** an undiagnosed error *)
    | `Unimplemented     (** operation not yet implemented in the code *)
    | `Is_read_only      (** you cannot write to a read/only instance *)
    | `Disconnected      (** the device has been previously disconnected *)
  ]

  type id

  val disconnect: t -> unit io

  type info = {
    read_write: bool;    (** True if we can write, false if read/only *)
    sector_size: int;    (** Octets per sector *)
    size_sectors: int64; (** Total sectors per device *)
  }

  val get_info: t -> info io

  val read: t -> int64 -> page_aligned_buffer list -> [ `Error of error | `Ok of unit ] io

  val write: t -> int64 -> page_aligned_buffer list -> [ `Error of error | `Ok of unit ] io

end
\end{minted}
\end{listing}

\section{Path ORAM}
\label{sec:pathORAM}

This section discusses how the abstract data structures and algorithms covered in \cref{sec:oramintro} were realised, and the design decisions involved in the implementation of the Path ORAM protocol.

\subsection{Inherent Constraints}
\label{subsec:constraints}

Writing an implementation to satisfy an existing interface places a number of constraints on the design of the system.

The first constraint is the obligatory use of the Cstruct library, introduced in \cref{subsec:libraries}. \texttt{BLOCK}'s \texttt{read} and \texttt{write} methods both require buffers of type \texttt{Cstruct.t}. ORAM therefore inputs buffers of this type and passes them to the underlying block device. To avoid unnecessary work marshalling data, all data is manipulated in this form.

Another constraint limits the type of addresses in \texttt{BLOCK}'s \texttt{read} and \texttt{write} operations to \texttt{int64}. Again, to avoid unnecessary (and potentially unsafe) work converting between types, \texttt{int64}s are used wherever possible.

\subsection{Stash}
\label{subsec:stashImpl}

The stash stores blocks of data temporarily on the client before they are written back into the tree on the server. It needs to support operations of insertion, lookup based on address, and removal. For this task I chose an \texttt{int64}-keyed hash table from Jane Street's Core library, with \texttt{Cstruct.t} values. The hash table takes constant time for the operations of insertion, lookup and removal, making it ideal for this purpose. This was abstracted into its own module, hiding the underlying type. Thus, the implementation of the stash module can be swapped without breaking the core code of the ORAM module.

The hash table implementation takes an initial size as a parameter, and expands when necessary. Expansion would add a large overhead, because the entire contents of the stash would have to be copied. However, as shown in \citet{stefanov2013path}, for a tree of height $L$ and bucket size $Z$, the stash requires exactly $Z \cdot (L +1)$ blocks of transient storage, and a constant amount of space for persistent storage. Therefore, the hash table should not need to perform expansion at run-time, if it is initialised to a size of at least this constant. \Cref{tab:stashsizes} shows the maximum stash size required, depending on the security parameter, $\lambda$, and the bucket size, $Z$. A stash with security parameter $\lambda$ has probability $2^{-\lambda}$ of exceeding this stash size.

To achieve statelessness, the stash must be written to disk. Increasing the security parameter increases the size of the stash, and therefore the time overhead of statelessness. However, the security parameter must be high enough to ensure long term security. Therefore, a trade-off must be made between security and speed. While keeping the security parameter high, the time overhead can be improved by reducing the size of a block. As mentioned above, there is a constant maximum number of blocks in the stash, so this reduction will have a direct effect on the time taken for each access.

\begin{table}
\centering
\begin{tabular}{|l|l|l|l|}
    \hline
    & \multicolumn{3}{c|}{Bucket Size ($Z$)} \\
    \cline{2-4}
    Security Parameter ($\lambda$) & 4 & 5 & 6 \\
    \cline{2-4}
    & \multicolumn{3}{c|}{Max Stash Size} \\
    \hline
    80 & 89 & 63 & 53 \\
    \hline
    128 & 147 & 105 & 89 \\
    \hline
    256 & 303 & 218 & 186 \\
    \hline
\end{tabular}
\caption{Empirical results for maximum persistent stash size from \citet{stefanov2013path}}
\label{tab:stashsizes}
\end{table}

\subsection{Position Map}

The position map associates a leaf position with each block of the data. As mentioned in \cref{subsec:constraints}, the \texttt{BLOCK} interface constrains the type of block addresses to \texttt{int64}. OCaml provides a Bigarray module, but the size of its arrays is specified using the OCaml \texttt{int} type. This type only uses 63 bits on a 64-bit machine and 31 on a 32-bit machine. Both of these types are also signed, reducing the number of available bits by one. Therefore, a type that can range up to $2^{64} - 1$ needs to be represented using a type that can only go up to $2^{30} - 1$ on a 32-bit machine.

To accommodate this, I built the position map using 3-dimensional arrays. The index, an \texttt{int64} value, is split into a 4-bit value and two 30-bit values. The 4-bit value consists of the four most significant bits, and will therefore have a value of 0 unless more than $2^{60}$ blocks are being stored. The 30-bit values are guaranteed to be converted into non-negative \texttt{int}s, which can then be used to index two dimensions of the array.

In the position map's \texttt{create} function, \Cref{alg:posmapdims} is used to translate from the desired \texttt{int64} size to the dimensions of a 3-dimensional array. After splitting the \texttt{int64} as described above, a value of 1 is added to the first two dimensions to ensure that they are at least of size 1. If a higher dimension is greater than 1, then all lower dimensions become their maximum value, in this case $2^{30}-1$. Using these dimensions, the position map is guaranteed to be at least the size that we require on both 32-bit and 64-bit machines.

\begin{algorithm}[t]
\caption{Calculate the dimensions of a 3D array given total desired size}
\label{alg:posmapdims}
\footnotesize
\begin{algorithmic}[1]
\vskip 10pt
\Require{$\mathsf{size} > 0$}
\vskip 10pt
\Function{PosMapDims}{$\mathsf{size}$}
\vskip 10pt
    \State $(x, y, z) \gets$ \Call{SplitIndices}{$\mathsf{size}$}
\vskip 10pt
    \State $x \gets x + 1$
    \State $y \gets y + 1$
\vskip 10pt
    \If{$x > 1$}
        \State $y \gets \mathsf{0x3FFFFFFF}$
        \State $z \gets \mathsf{0x3FFFFFFF}$
    \ElsIf{$y > 1$}
        \State $z \gets \mathsf{0x3FFFFFFF}$
    \EndIf
\vskip 10pt
    \State \Return $(x,y,z)$
\vskip 10pt
\EndFunction
\vskip 10pt
\end{algorithmic}
\end{algorithm}

\subsection{Creating ORAM}

A major goal of this project was to be build ORAM such that it could replace any existing block device in any Mirage program. To do this, the ORAM module must satisfy MirageOS's \texttt{BLOCK} interface, shown in \cref{lst:blocksig}. It also requires access to the methods of the underlying block device, as well as to the block device itself. ORAM is therefore built as a functor in the same way as the encryption module. This functor takes a module that satisfies the \texttt{BLOCK} interface and returns a new module, which satisfies the same interface, but now implements the Path ORAM protocol.

ORAM's \texttt{create} method takes a block device as input and returns an instance of ORAM, which has the type \texttt{Oram.Make(B).t}, shown in \cref{lst:orammaketype}. This type contains the ORAM parameters, such as \texttt{bucketSize} and \texttt{blockSize}, and  structural information, such as the \texttt{height} of the ORAM and the \texttt{numLeaves}, as well as pointers to the stash, position map, and underlying block device.

Along with the block device, the following parameters are passed as input to the \texttt{create} method:

\begin{description}
  \item[\texttt{size}] The desired size of the ORAM in blocks
  \item[\texttt{blockSize}] The desired size of a single block in bytes
  \item[\texttt{bucketSize}] The number of blocks in a bucket
\end{description}

Using these, the \texttt{create} method can calculate new structural information. The \texttt{BLOCK} interface defines the size of the block device in \emph{sectors}, using the variable \texttt{size\_sectors}, and defines the size of a sector using \texttt{sector\_size}. We will continue to refer to data blocks in ORAM as blocks, but to satisfy \texttt{BLOCK}, ORAM exposes values for both \texttt{size\_sectors} and \texttt{sector\_size}. First, the number of sectors required for a block is calculated as $$\mathtt{sectorsPerBlock} = \frac{\mathtt{blockSize} - 1}{\mathtt{sector\_size}} + 1,$$ which rounds up the number of sectors so the desired block size can always fit. The \texttt{sector\_size} that ORAM uses is the size of the part of the block that stores data. Thus, the size of the address, 8 bytes, is subtracted giving $$\mathtt{sector\_size} = \mathtt{sector\_size} \times \mathtt{sectorsPerBlock} - 8.$$

Now the height of the tree can be calculated, but there are two cases to consider. If the desired size of the ORAM is specified, then the height is calculated as $$L = \left\lfloor \log_2\left(\frac{N}{Z} + 1\right)\right\rfloor - 1.$$ This is obtained by rearranging the equation for the size of the binary tree in buckets, $2^{L + 1} - 1$. The floor operator is introduced, resulting in a binary tree of size less than or equal to the desired size that will definitely fit on the block device. If, instead, the size is unspecified, it is assumed that ORAM should fill as much of the device as possible. The desired size becomes $$N = \frac{\mathtt{size\_sectors}}{\mathtt{sectorsPerBlock}}$$ and then the same calculation as above is performed with this new value. Finally, \texttt{numLeaves} and a new value for \texttt{size\_sectors} are calculated from $L$ using the usual equations for a binary tree.

All of the structural information is now known, so the \texttt{create} method can now create instances of the client-side data structures and initialise the ORAM space. To do the former, it calls the creation functions of the associated data structures. For the latter, it loops through the block device, writing dummy blocks to every location. Dummy blocks have address $-1$, and are ignored by the access protocol. Finally, the \texttt{create} method packages everything up as an instance of \texttt{ORAM.Make(B).t}.

\begin{listing}[t]
\caption{The type of an ORAM device \texttt{ORAM.Make(B).t}}
\label{lst:orammaketype}
\inputminted[fontsize=\scriptsize,firstline=87, lastline=118]{ocaml}{../mirage-oram/lib/oram.ml}
\end{listing}

\subsection{Accessing ORAM}

The main logic of ORAM resides in the \texttt{access} function, the implementation of \cref{alg:access}. Before discussing this, it is worth mentioning the plumbing that occurs on either side of it. The \texttt{BLOCK} interface function \texttt{write} inputs data as a list of \texttt{Cstruct.t}s with no defined size. The \texttt{access} function expects a fixed-sized block tagged with an address, so \texttt{write} splits the input into chunks and tags them, before calling \texttt{access} on each one.

The subroutines \texttt{readBucket} and \texttt{writeBucket} are on the other side of \texttt{access}. They are responsible for communicating with the underlying block device and, more importantly, maintaining the structure of the logical binary tree. There are no physical pointers, but instead the structure is built by calculating the appropriate physical address of a bucket. The physical address of the bucket on the path to leaf $x$ at level $l$ is calculated using \cref{alg:bucketaddress}.

\begin{algorithm}[t]
  \footnotesize
  \begin{algorithmic}
  \vskip 10pt
    \Function{BucketAddress}{$x$,$l$}
      \State $address \gets 0$
      \For{$i = 0;~i < l;~i++$}
        \If{$x >> (i + \mathtt{height} - l)~\&\&~1 = 1$}
          \State $address \gets (2 \times address) + (\mathtt{bucketSize} \times \mathtt{sectorsPerBlock} \times 2)$
        \Else
          \State $address \gets (2 \times address) + (\mathtt{bucketSize} \times \mathtt{sectorsPerBlock})$
        \EndIf
      \EndFor
      \State \Return $address$
    \EndFunction
  \vskip 10pt
  \end{algorithmic}
  \caption{Calculating the physical address of the bucket at level $l$ on the path to leaf $x$}
  \label{alg:bucketaddress}
\end{algorithm}

\begin{figure}[t]
    \centering
    \begin{tikzpicture}[level/.style={sibling distance=75mm/#1},
        level 3/.style={sibling distance=18mm},
        every node/.style={minimum size=10mm,shape=circle},
        edge from parent/.style={draw,-latex}]
        \node[draw,grow=down] {0}
        child {
          node[draw] {1}
          child { node[draw] {3}
            child {
              node[draw] (n7) {7}
            }
            child { node[draw] (n8) {8} }
          }
          child { node[draw] {4}
            child { node[draw] (n9) {9} }
            child { node[draw] (n10) {10} }
          }
        }
        child {
          node[draw] {2}
          child { node[draw] {5}
            child { node[draw] (n11) {11} }
            child {
              node[draw] (n12) {12}
              edge from parent node[near start,right] {1}
            }
            edge from parent node[near start,left] {0}
          }
          child { node[draw] {6}
            child { node[draw] (n13) {13} }
            child { node[draw] (n14) {14} }
          }
          edge from parent node[above] {1}
        };
        \node [below of=n7] {0};
        \node [below of=n8] {1};
        \node [below of=n9] {2};
        \node [below of=n10] {3};
        \node [below of=n11] {4};
        \node [below of=n12] {5};
        \node [below of=n13] {6};
        \node [below of=n14] {7};
  \end{tikzpicture}
  \caption{Visualisation of \cref{alg:bucketaddress}}
  \label{fig:bucketaddress}
\end{figure}

This is most easily explained using \cref{fig:bucketaddress}. Here, the nodes are labelled in order of their position in memory. The leaves are also labelled, but the binary representations of these labels are more important. The binary representation of leaf $x$, read from left to right, denotes the set of operations required to calculate the physical address of $x$. A 0 denotes taking the left branch at a node, labelled $n$, and the resulting node has label $2n + 1$. A 1 denotes taking the right branch, and the resulting node has label $2n+2$. For example, the path to leaf 5, with binary representation 101, takes the right branch from the root, then the left branch, and then the right, giving $$2 \cdot (2 \cdot (2 \cdot 0 + 2) + 1) + 2 = 12.$$ Multiplying this node label by the block size and the bucket size gives the physical address of the node. The same procedure can be used for a node at a specific level, $l$, but now only the path denoted by the first $l$ bits will be followed.

The final part of ORAM is the \texttt{access} function. Its parameters are \texttt{op}, which is either \texttt{read} or \texttt{write}, \texttt{a}, which is the address of the block to access, and \texttt{data'}, which contains the data to be written when \texttt{op} is \texttt{write}. \texttt{data'} is implemented using an option type, so when \texttt{op} is \texttt{read}, \texttt{data'} will have value \texttt{None}.

In \cref{sec:oramintro}, \texttt{access} was split into four steps:
\begin{enumerate}
    \item Remap the address, \texttt{a}, in the position map,
    \item Read the path that \texttt{a} was previously mapped to,
    \item If \texttt{op} is \texttt{write}, then write \texttt{data'} into the block with address \texttt{a} in the stash,
    \item Write the same path back, but filled with new blocks from the stash.
\end{enumerate}

Step 1 calls a pseudo-random function to choose a new position for the block uniformly at random. This operation ensures the security of the Path ORAM construction by making subsequent accesses to the same address statistically independent.

Step 2 calculates the physical address for each bucket along the path, using the subroutines described above, then reads the contents of each bucket into the stash.

Step 3 looks up the block with address \texttt{a} in the stash, stores its current data to be returned by the function, and then replaces it with \texttt{data'}.

Step 4 decides which blocks to write back into the path. The na\"ive implementation of this, suggested by \cref{alg:access}, loops through the stash and finds blocks with address \texttt{a'}, such that the bucket at level $l$ on the path to leaf $\mathtt{position[a']}$ is the same as the bucket at level $l$ on the path to leaf $x$. I made two optimisations to this implementation. The first is to perform the position lookup only once, tagging blocks with their positions in a temporary data structure to avoid repeated work at each level. The second is to avoid calculating the bucket addresses entirely; in order for the paths to two leaves to intersect at level $l$, the leaves must have the same first $l$ bits. Thus, checking for intersection can be reduced to performing a right bit shift on both $x$ and $\mathtt{position[a']}$ of $\mathtt{height} - l$ bits, and checking for equality.

This concludes the discussion of the basic ORAM functor, which can be used to augment a block device in any Mirage program. \cref{subsec:recursion,subsec:statelessness} examine the addition of recursion and statelessness to this ORAM construction.

\subsection{Recursion}
\label{subsec:recursion}

The essence of recursive ORAM is that the position map of one ORAM is another ORAM. To make this possible, I extended the original ORAM functor, parametrising it in a new \texttt{PositionMap} interface. This interface is satisfied by the original, in-memory position map module, as well as by the ORAM functor itself. Applying this new ORAM functor once with the in-memory position map module gives the basic ORAM functor discussed above. However, applying the functor again, with the result of the first application, gives a recursive ORAM module with one level of recursion. This can be repeated to an arbitrary depth. This exhibits the power of OCaml's module system.

The recursive ORAM module can be constructed manually as above, applying the functor $n + 1$ times for $n$ levels of recursion. However, it is preferable to build the recursive module automatically, based on the size of the data ORAM, ORAM$_0$. Addresses are 64-bit integers, so they occupy 8 bytes of storage each. If ORAM$_0$ has size $N$ in blocks, and the size of each block is $B$ bytes, then one block in ORAM$_1$, used as the position map for ORAM$_0$, can store $\chi = B / 8$ addresses. The number of blocks required for ORAM$_1$ will therefore be $N / \chi$. After $\log N / \log \chi$ levels of recursion, the in-memory position map will be of size $O(1)$.

So, the recursive ORAM module is created by taking in the size, $N$, and the block size, $B$, and automatically applying the ORAM functor recursively $\log N / \log \chi$ times. Calling the $\mathtt{create}$ function of the resulting module creates ORAM instances with the correct number of levels of recursion.

\subsection{Statelessness}
\label{subsec:statelessness}

In order to achieve statelessness for ORAM, its type information, the stash, and the position map must all be stored on disk. The layout of this information on disk is explained first, followed by the method for flushing it to disk.

Once ORAM has been initialised and is in use, relocating it on disk is very costly, because the whole data structure would need to be copied. However, to reconnect to ORAM, the information necessary to discover ORAM's existence must be stored in a well-known location. The first block of the underlying block device is therefore used as a \emph{superblock}, which is a block containing the most important metadata. This superblock contains a pointer to the location of the rest of the state, along with its length. This way, ORAM can be stored, starting at the second location on disk, and the state can be appended at the end of the ORAM section. Thus, ORAM never has to be moved once it has been initialised.

To store the state, it must be serialised first, which translates it into a form that can be written to disk. For the majority of the information, an existing serialisation library, Jane Street's Bin\_prot, can be used. This is a binary protocol which allows one to annotate a type with \texttt{[@@ deriving bin\_io]} and generates functions to read and write instances of the type into buffers. This is used for all of ORAM's type information, as well as for the stash, but not for the position map. ORAM's type is therefore split into a core, that can use Bin\_prot, and an extended type, which includes the position map and the underlying block device. This structure is shown in \cref{lst:orammaketype} on \cpageref{lst:orammaketype}.

The position map is more difficult to serialise because, under recursive ORAM, it might be another ORAM. To avoid writing the entire position map ORAM onto the disk a second time, a custom serialisation function is used that only stores metadata for ORAM position maps, but stores the entire in-memory position map in the base case. The state is stored at the end of the block device, after all recursive instances, so this function collects the data from all the levels of recursion together into one buffer.

After all the state has been flushed to disk, it is safe to disconnect from ORAM. Reconnecting to ORAM is a case of checking for the presence of the superblock, reading in the location and length of the state, reading the actual state, and then calling the connect function on the position map. The connect function allows each recursive ORAM instance to have its own reference to the underlying block device, and returns once it reaches the in-memory position map.

\section{File System}
\label{sec:fileSystem}

To search over a set of documents, they first need to be stored. A suitable file system for MirageOS did not exist, hence, this section describes the design and implementation of a basic file system that satisfies the requirements of the project.

\subsection{General Design}
The most common way of building a file system on top of a block device is through the use of \emph{inodes}. An inode contains meta-information about a file, along with pointers to the actual data blocks. For the purposes of this project, an inode will simply be one block of the block device, containing the length of the file, followed by the list of pointers. In a system with more complex needs, the inode would contain more information, such as modification/access timestamps, file permissions, etc., but for this project it suffices to be able to read and write documents.

To allow quick access to the inode for a particular file, its location is stored in an index. Rather than indexing based on filenames, which may have variable lengths, it is preferable to index based on the hash of the filename. This allows a more regular layout of the data in memory due to the constant length of indexes. In a real system, collisions in the hash function would need to be handled but, for a small number of files, this possibility can be neglected. \Cref{subsec:inodeindex} describes the implementation of the inode index.

Space needs to be allocated on the block device for inode index blocks, for inode blocks, and for data blocks, which can be achieved using a \emph{free map}. This is a map that tracks whether each block has been allocated and can be updated as new blocks are allocated and freed. \Cref{subsec:freemap} describes the implementation of the free map.

To be able to disconnect from the file system, it needs to be stateless. Therefore, the data structures need to be stored on disk, along with enough information to find them. The root address of the inode index and the length of the free map are enough to locate the data structures on disk when reconnecting to the block device. These two pieces of information are stored at address 0 in another superblock.

\subsection{Inode Index}
\label{subsec:inodeindex}

The inode index associates keys, in the form of filename hashes, with values, in the form of pointers to inodes. It is implemented using B-Trees\footnote{The algorithms for B-Tree operations were adapted from \citet{CLRS09}}, which not only support operations of insertion, lookup, and deletion efficiently, but are also stored directly on disk. There were no B-Tree libraries available for OCaml at the time of writing, so I implemented one myself.

B-Trees are a generalisation of self-balancing binary search trees, where each node can have more than one child. If a node has $n$ children, then it stores $n-1$ keys. It is guaranteed that $$ \forall m \leq n, k \in child_m, j \in child_{m+1} . k < key_m < j,$$ that is, a key is greater than all the keys to its left and less than all the keys to its right.

B-Trees are an efficient on-disk data structure because a whole block can be used for one node. This gives an extremely high branching factor, reducing the depth of the tree and therefore the number of blocks that need to be accessed in any single operation. On creation of the file system, the branching factor of the tree is calculated such that as much of each block as possible is filled with useful information.

I implemented B-Trees as a functor, which is parametrised in three module interfaces: one for the nodes, one for the allocation module, and one for the storage module. This makes the library more generalised, which means it may find wider use in the OCaml community. For the purposes of this project, I wrote a node module that uses 16-bit pointers between nodes, and stores \texttt{int64} inode pointers. For the allocation and the storage modules, this project uses the free map, described in \cref{subsec:freemap}, and the ORAM module, respectively.

\subsection{Free Map}
\label{subsec:freemap}

To allocate space efficiently, an array of bits the size of the block device can be used. For statelessness, the free map is flushed to disk regularly. Therefore, it was beneficial to write my own bit array based on \texttt{Cstruct}s, rather than using a library implementation. This allows the whole structure to be written directly onto the disk using the block device methods, without any costly translation.

The \texttt{Cstruct} library performs data access in bytes. This leads to \cref{alg:bitgetset} for getting and setting individual bits. To get the $n^{th}$ bit, it must be extracted from the $\frac{n}{8}^{th}$ byte. To do this, the index of the bit in the byte is calculated, a 1 is left-shifted to that position, and an and operation is performed, which masks that bit. Setting is a similar operation, but seeks to preserve the surrounding bits. To set a 1, the index of the bit in the byte is calculated, a 1 is shifted to that position, and an or operation is performed, which sets the bit to 1 while preserving all other bits. Setting a 0 is slightly trickier. It would be performed by an and operation with a bit string that is 0 at the desired position and 1 everywhere else, but shifting fills empty bits with 0s. Instead, by the use of De Morgan's Law $$ a~\&\&~b = \neg (\neg a~||~\neg b) $$ this is converted into an operation involving a bit string that has a 1 at the desired position and 0s everywhere else.

\begin{algorithm}[t]
\caption{Getting and setting individual bits in a byte array}
\label{alg:bitgetset}
\begin{algorithmic}
\vskip 10pt
\Function{GetBit}{$\mathsf{index}$}
  \State $byte \gets \mathsf{byteArray[index]}$
  \State $shift \gets 7 - \mathsf{index} \bmod 8$
  \State \Return $byte >> shift~\&\&~1$
\EndFunction
\vskip 10pt
\Function{SetBit}{$\mathsf{index,boolean}$}
  \State $byte \gets \mathsf{byteArray[index]}$
  \State $shift \gets 7 - \mathsf{index} \bmod 8$
  \If{$\mathsf{boolean}$}
    \State $byte \gets byte~||~1 << shift$
  \Else
    \State $byte \gets \neg (\neg byte~||~1 << shift)$
  \EndIf
  \State $\mathsf{byteArray[index]} \gets byte$
\EndFunction
\vskip 10pt
\end{algorithmic}
\end{algorithm}

\section{Search Module}
\label{sec:searchmodule}

The final part of the system performs search over the encrypted documents stored in ORAM. \cref{subsec:invertedindex} discusses building an inverted index, the data structure that enables efficient search. \cref{subsec:searchapi} then explains the front-end of the whole application, the search API.

\subsection{Inverted Index}
\label{subsec:invertedindex}

The basics of inverted indexes are discussed in \cref{sec:invertedindexintro}. As stated, the index consists of two main structures, the dictionary and the postings. For the dictionary, I used the most common implementation, a hash table. This provides $O(1)$ lookup and insertion, which are the main operations required. The postings are more flexible. These are where the filenames are stored, because they are not actually stored in the file system itself. Performing conjunctive queries requires taking the intersection of postings lists. Thus, each postings list is implemented using a hash set, which is a data structure built on top of a hash table. It stores a set of keys, in this case filenames, and has the added benefit of keeping them unique automatically.

As usual in this project, files are stored as \texttt{Cstruct}s. There are a number of steps to be performed in order to index a file. First, the file is converted to a string and immediately stripped of unnecessary characters, including all punctuation. At this point, the file is a sequence of alphanumeric character strings, separated by spaces and newlines, so performing a split on these characters results in a list of words.

Before inserting this list into the index, two techniques are used to improve efficiency. Storing separate words for `run', `ran', and `runs' increases the size of the index. To reduce the impact of this, some linguistic preprocessing is performed. Specifically, stemming is carried out, which uses a set of rules to prune suffixes, mapping words onto a stem. I used a small open-source library implementation of Porter's stemming algorithm \cite{porter1980algorithm}, ocaml-stemmer. This technique not only reduces the size of the index, but also arguably improves search, because now queries for `run' can automatically return documents containing morphological derivations. A second technique removes duplicates from the resulting list of words, reducing insertion overhead. After these techniques have been applied, the remaining entries are inserted into the index.

For the purposes of this project, only simple conjunctive queries are used, meaning search looks for documents containing all of the words in a space-separated query string. To do this, preprocessing is performed as above on the query string. Then, each term is looked up in the dictionary, resulting in a list of hash sets. Finally, the big intersection of these hash sets produces the result. In order to make this intersection operation efficient, the list is first sorted by hash set size. For each pairwise intersection, the smaller hash set is filtered by checking for membership in the larger hash set. This means that one constant time lookup is performed for each member of the small set, rather than the large, resulting in a significant performance boost, as the size of the smaller set is monotonically decreasing.

\subsection{Keyword Search API}
\label{subsec:searchapi}

The final step in an end-to-end system is the API. File system access, indexing and search all need to be wrapped together into one module that provides a single point of entry for the encrypted search system.

There are three main operations which are the most important to support: writing files, reading files, and searching over files. This project is not concerned with deleting files, because this is not necessary for the evaluation of ORAM.

Writing files is the most complicated step because this is where indexing is performed. During a write, the file is first written through to the file system. It is then indexed and, finally, the index is flushed to disk to ensure it persists. Reading files passes through to the file system and search makes calls to the inverted index.

\section{Summary}
\label{sec:implSummary}

In this chapter I have discussed the implementation of all of the components of my system, including the key design decisions and trade-offs that were made. I demonstrated how I implemented recursive Path ORAM in a way that allows it to plug into existing MirageOS programs, and how I built a file system and search operations on top of it.

The next section discusses evaluation, which assesses whether I have implemented ORAM such that it provides functionality, performance, and security and therefore whether the aims of the project have been achieved.

\chapter{Evaluation}

This chapter explains the methodology used to ensure the correctness of the ORAM implementation, and analyses its performance and security properties. \Cref{sec:unitTests} discusses functional testing through the use of unit testing and randomised testing. \Cref{sec:performanceTesting} then goes on to discuss performance, and then finally \cref{sec:statisticalAnalysis} analyses the security of ORAM.

Overall, ORAM performs as expected in terms of functionality, performance, and security. It operates correctly, writing files and reading the same data back out. It continues to do so with the additions of statelessness, recursion, and encryption. All parts of the system are fully functioning, both separately, and as a whole. In terms of performance, my implementation agrees with the theoretical bounds given in \citet{stefanov2013path}, which state $O(\log N)$ time overhead. Finally, statistical analysis shows that ORAM does indeed have a statistically random access pattern, ensuring the security of the implementation.

\section{Unit Tests}
\label{sec:unitTests}

I used unit testing throughout the development process, which allowed me to be sure that individual components were functioning correctly before I combined them into larger, more complicated systems. This section describes the most important test cases that I ran for each module, and discusses the use of randomised testing to cover a larger range of input values. Code coverage testing was also used to make sure that the tests were exercising all parts of the system.

\subsection{Stash}

The three main tests for the stash check that:

\begin{itemize}
  \item Values not expected to be in the stash are not found
  \item Values that have been added to the stash are found
  \item Adding dummy blocks to the stash has no effect
\end{itemize}

During the development process, I hand-coded a small number of example cases. Later, in order to test more extensively, I coded the above three cases as properties for randomised testing. Randomised testing generates a number of test cases and verifies that the properties hold in every case, allowing far more cases to be covered than hand-coded tests alone.

\subsection{Position Map}

Two main aspects of the position map need to be tested. Firstly, the translation of 64-bit addresses into three integer indexes, and, secondly, its operation as a data structure.

The code performing the translation is essentially a mathematical definition in itself, so any property definition that might be used for randomised testing would be equivalent to the original code. Thus, in this instance, a few hand-chosen random cases can be checked, along with the key edge cases. The edge cases to check are the maximum and minimum values, along with values either side of a change in the higher indices. That is, values with output $(x,y,max\_int)$ and $(x,y+1,0)$, and similarly for the higher index.

In terms of operation, it is necessary to check that after adding a value to the position map, the same value is read back, and, furthermore, that trying to add a value at an address outside of the allowed range results in an error. Both of these cases can be coded up as properties for randomised testing.

\subsection{ORAM}

The ORAM implementation is particularly amenable to randomised testing because there are inverse functions for most of its functionality. It is therefore possible to write properties for randomised testing of the form $f(f^{-1}(x)) = x$. This allows each stage of ORAM to be tested independently, from writing individual blocks, through writing whole paths, to writing entire files.

Regular unit testing was performed on other functions, for which properties could not be easily coded. These functions include reconnecting to ORAM to test statelessness, and performing calculations, such as computing the height for ORAM tree.

\subsection{Free Map}

The free map allocates blocks in the block device to be used by different parts of the file system. It is necessary to test that, on creation, the map allocates the correct number of initial blocks, that it always allocates the correct number of blocks if they are available, and that it only ever allocates blocks that are actually free.

During development, I used a selection of test cases which covered important edge cases. These included creating the map, and checking that nothing was allocated to begin with, except the first $n$ blocks, where $n$ is a parameter of the creation function. Then, I tested that allocating and deallocating different sequences of blocks led to the expected results. Finally, I checked that an error was returned if there were not enough free blocks.

After main development was completed, I added some randomised testing to this module to cover a wider range of inputs.

\subsection{B-Trees}

It is not possible to test the B-Tree library directly because it only provides a functor to create a B-Tree. Hence, I tested the B-Tree using the inode index, the canonical use case of the library in this system. In order to test the B-Tree properly, I used randomised testing to create a reasonable access pattern that was guaranteed to cause splitting of the root node.

\subsection{Inodes}

The inodes were tested to make sure they  added and deleted pointers, while maintaining a correct count. Again, this is amenable to randomised testing, although unit tests were also used throughout the development to test specific cases that would be expected to cause exceptions.

\subsection{File System}

To test the file system, I created a large number of random files and used randomised testing to ensure that the correct files were always read back out once they had been written in, under any access pattern. This also included testing to ensure that reasonable exceptions were produced when the file system became full, or a file did not exist in the system.

\subsection{Search Module}

The search module was tested for correctness, i.e. that if a file containing a word had been put into the system, then it would appear in the search results for that word. This was difficult to randomise, so I manually constructed a number of test scenarios.

\section{Performance Testing}
\label{sec:performanceTesting}

\subsection{Parameter Optimisation}
\label{sub:parameterOptimisation}

Before performing the main experiments, I decided to optimise ORAM's parameters so that the experiments could run faster. The main parameter in question is the block size, shown in \citet{ousterhout1985trace} to dramatically affect the speed of IO operations. I discovered that this is indeed the case with ORAM, as can be seen in \cref{fig:blockSizeResults}. It appears that increasing the block size results in an unbounded increase in performance, but I settled on a block size of 1MB to trade off between speed and precision in specifying the size of the block device.

It is important to note that using ORAM in the cloud would give very different results. Increasing the block size would increase the size of the stash proportionally, because the maximum stash size is a constant number of blocks. Thus, a larger block size increases the amount of data that must be written to disk to achieve statelessness. I have shown that increasing block size increases speed on a local disk, but network latency and bandwidth will dominate when running in the cloud, leading to a slow down. I would need to perform further experiments in the cloud to discover an optimal trade-off point for this scenario.

\begin{figure}
    \centering
    %\includegraphics[width=.8\linewidth]{blockSizeResults}
    \input{blockSize.tex}
    \caption{Plot of the time taken to transfer 80MB of data at varying block sizes and sizes of ORAM. Each line represents one ORAM size, $N$, so as block size increases, the time decreases.}
    \label{fig:blockSizeResults}
\end{figure}

\subsection{Comparison with Literature}
\label{subsec:comparisonWithLiterature}

Two steps were taken to effectively test the overheads due to ORAM.

The first was to isolate and remove major sources of uncertainty. When I first ran the experiments, I attempted to run them on my local machine. In this environment, other processes interfered with the ORAM process, making the results unreliable. I secured a remote testing machine in order to run the experiments in complete isolation. Here, at first, they were running on an NFS mounted drive, meaning that network latency and protocol overheads were affecting the results. After moving the experiments to a local disk, it was clear that ORAM overheads were finally dominating the results.

Secondly, I needed to initialise the ORAMs properly. When a fresh ORAM is created, the stash is empty and all of the blocks are dummy blocks that are disregarded. It is necessary to run an initialisation sequence to remove the effects that this has on the results. I used the worst case sequence, which writes every block in turn, then reads every block in turn, repeatedly. This ensures that all blocks are used multiple times, leaving the ORAM in a state that it would likely be in after extended use. The performance can only be reliably tested once this steady state has been reached.

During the actual experiment, I used a random sequence of block accesses rather than the worst case sequence. I performed 10 runs of 1000 iterations each, hence, using 1MB blocks, each run transferred $\approx$1GB data. Accesses oscillated between reads and writes in order to balance the different overheads. To measure how the performance changes as the block size increases, I used ORAMs of a range of sizes, from 12 blocks (a tree of depth 1) to 4092 blocks (a tree of depth 9). The log of the time taken for each ORAM to perform 1000 iterations is plotted against the logarithm of the size of the ORAM in \cref{fig:timeResults}. I expected this to be a straight line, because I expected logarithmic overheads from ORAM, and I increased the block sizes approximately in powers of two. We can see that ORAM does indeed show a logarithmic overhead compared to the control experiment. The control experiment performed the same sequence of accesses to a block device using the \texttt{BLOCK} interface without ORAM. The block device with height 12 and block size 1MB has a total capacity of $\approx$4GB. ORAM took $\approx$10s to transfer $\approx$1GB of data, which is 8 seconds longer than the control device. This would seem to be a reasonable overhead to provide complete privacy of the access pattern.

I performed the same experiment again, this time adding encryption to ORAM, which revealed that the overhead was still logarithmic, but with a larger constant. Encryption added on average 2 seconds to a data transfer of 1GB, an acceptable overhead when it underpins security.

\begin{figure}
    \centering
    %\includegraphics[width=.8\linewidth]{timeResults}
    \input{timeResults.tex}
    \caption{The relationship between size of an ORAM in blocks and the time taken for 1000 operations, plotted for ORAM, encrypted ORAM, and a control block device with no ORAM. We take logs of both axes, because block size was increased in powers of two and we expect a log relationship.}
    \label{fig:timeResults}
\end{figure}

%The initialisation phase allowed the stash size to reach a steady state, which resulted in a constant overhead of $x$ blocks, compared with $y$ predicted in \citet{stefanov2013path} for a security parameter, $\lambda = z$. This shows something...

%In these first experiments, we used stateless, non-recursive ORAM, which means that the entire of the position map was being flushed to disk on every access. We now compare the time taken for stateless, recursive ORAM, to see what effect the additional recursive calls have compared to the reduction in the size of the client side state. We can see from \cref{fig:recursiveTimePlot} that overall this was good/bad to some degree.

Although it has not been discussed here, bandwidth is another important measure when examining the performance of ORAM, especially when it is being used in the cloud. With more time, a detailed study of the bandwidth used with and without recursion would have been carried out, which would have given insight into the trade-off that recursion presents us with.

\section{Statistical Analysis}
\label{sec:statisticalAnalysis}

To demonstrate the effectiveness of ORAM, it must be shown that the access pattern observed by the block device is statistically random. There is structure to this randomness, because on each access a whole path is read and written. For a tree of height $L$, $L+1$ accesses are made, each to a subrange that corresponds to a level of the tree. However, the actual path that is written should be random. Thus, a test for randomness should be performed on a sequence of path indices.

I used two techniques: autocorrelation plotting and runs testing.

Autocorrelation plotting plots the correlation of a sequence with itself at various time lags \cite{nistautocorr}. For a random sequence, the noise cancels itself out, resulting in a plot of values very close to zero. The exception is lag 0, where the correlation will be exactly the power of the signal. I plotted this for two access patterns: one of length 200, for which the results would be easily visible; and one of length 5,000, to show longer term effects. For both of these patterns, the underlying access pattern, which would be seen without ORAM, was simply a succession of reads and writes to the same location. These plots are shown in \cref{fig:shortAutocorr} and \cref{fig:longAutocorr} respectively.

\begin{figure}
    \centering
    \begin{subfigure}{\textwidth}
        \centering
        %\includegraphics[width=\linewidth]{shortAutocorr}
        \input{shortAutocorr.tex}
        \caption{Autocorrelation plot of a 200 iteration access pattern}
        \label{fig:shortAutocorr}
    \end{subfigure}
    \begin{subfigure}{\textwidth}
        \centering
        %\includegraphics[width=\linewidth]{longAutocorr}
        \input{longAutocorr.tex}
        \caption{Autocorrelation plot of a 5,000 iteration access pattern}
        \label{fig:longAutocorr}
    \end{subfigure}
    \caption{Two autocorrelation plots, with the autocorrelation coefficient on the y-axis and time lag on the x-axis. The dashed black lines represent confidence bands of 95\% and 99\%. For a random sequence, most of the points should fall within the 95\% confidence bound, as they do on both of these plots.}
    \label{fig:autocorr}
\end{figure}

The dashed black lines represent confidence bands of 95\% and 99\%. To conclude that a sequence is random, almost all of the autocorrelation coefficients should lie within these bands. This is indeed the case, so ORAM has successfully taken a heavily non-random access pattern and randomised it.

Runs testing attempts to detect non-random behaviour in a signal by counting the number of runs, which are sequences of values that are all above or below the median value. Too few runs in a sequence suggests a trend, whereas too many runs suggests cyclic behaviour. For a large sample, the distribution of runs can be approximated using a normally-distributed random variable, and compared with the distribution that is measured. To generate the distribution, I took 1000 sample access patterns, each of length 180\footnote{This length was chosen due to the availability of the tail cut-offs for this length distribution \cite{masliah2000stationarity}. The tail cut-offs for other lengths could not easily be found.}. I cut each pattern into 18 equally sized segments of size 10, took the mean of each segment, and counted the number of runs in the sequence of means. There can be between 2 and 18 runs in each sequence, but, for a random sequence, at least 90\% of the sequences should have between 7 and 14 runs. 7 and 14 are the 0.05\% tail cut-offs for this distribution \cite{masliah2000stationarity}. \Cref{fig:runsTestPlot} plots this distribution, along with the tail cut-offs, and 92.2\% of samples fall within the bounds. Thus, I can conclude that the samples were generated from a random process.

\begin{figure}
    \centering
    % \includegraphics[width=.8\linewidth]{runsTestPlot}
    \input{runs.tex}
    \caption{The distribution of the number of runs in 1000 access patterns of length 180. The dashed black lines represent 0.05\% tail cut-offs. 92.2\% of values fall within these bounds, implying that the access patterns were created from a random process.}
    \label{fig:runsTestPlot}
\end{figure}

\chapter{Conclusions}

\section{Results}

I successfully implemented the Path ORAM protocol and demonstrated that my implementation agrees with the theoretical overhead of $O(\log N)$, and that it is a statistically secure implementation. Statelessness and recursion were extensions to the project which made it possible to disconnect from and reconnect to ORAM, which proved invaluable during the evaluation process. I built a functioning file system and a search module on top of this, which completed the end-to-end system that was the aim of the project.

Furthermore, the impact of ORAM on a real-world storage scenario was demonstrated. Using a 4GB ORAM block device, the transfer of 1GB of data took 10s, compared to 2s on the block device alone. Although this represents a decrease in transfer rate of a factor of four, it seems to be a reasonable overhead to ensure the privacy of important data.


\section{Future Work}

There is much work to be done in the field of ORAM. One of the main drawbacks of current schemes is that the size of the ORAM must be determined in advance, and the overhead associated with expanding it is enormous. I would like to investigate the possibility of creating a resizeable version of my implementation.

There are also optimisations that could be performed on this implementation. Along with the interesting optimisations mentioned in the literature, I can see room for analysis and optimisation of this specific implementation using benchmarking to identify code hot-spots.

I would like to perform further analysis to gain a deeper understanding of this implementation. I would like to perform more experiments on the initialisation stage of ORAM to understand how the size of the stash changes during this period. Moreover, I would like to spend time adjusting various parameters, including block size and bucket size, in an attempt to find an optimal set of values.

\section{Lessons Learnt}

This project has taught me many important lessons. Perhaps the most significant of these is time management. Throwing myself into the project during the Michaelmas term and vacation meant that I gave myself a large buffer to ensure any hiccoughs later on did not have severe consequences. Related to this, I learnt that predicting the time that each part of a project will take is very difficult and it is therefore important to be both conservative and realistic about the number of things that might go wrong. I learnt the importance and pleasure of sharing one's work with others through contributing to the Mirage community and being invited to talk about the project at Microsoft Research Cambridge. I would like to thank Markulf Kohlweiss from Microsoft Research for his advice and for giving me the opportunity to present my findings. Finally, I have learnt that it is far better to seek advice at the first sign of trouble, rather than waiting until the next scheduled meeting. The advice of experienced supervisors can turn the seemingly catastrophic into a minor disruption.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% the bibliography
\addcontentsline{toc}{chapter}{Bibliography}
\bibliography{refs}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% the appendices
\appendix

\chapter{Project Proposal}

\input{proposal}

\end{document}