cs181final.tex

\title{TorHMC: An onion routing network implementation for testing new features}
\author{
    Chris Beavers \& Sean Laguna \\ CS 181 Final Project \\ Department of Computer Science \\ Harvey Mudd College 
}
\date{December 7th, 2011}

\documentclass[12pt]{article}

\begin{document}
\maketitle

\begin{abstract}
Onion Routing is a method of anonymizing one's web activity by routing all traffic through an encrypted, random path through a network of ``nodes.'' It was developed as an idea in the late 90s\cite{gold} and today has gained popularization through the Tor project, for which there is extensive documentation available online\cite{tor}. In this paper, we explore the difficulties in implementing a minimalistic Onion Routing network, and explore possible security risks with Tor's current model. We conclude by presenting a more secure method of verifying revisitation to hidden servers within the network.
\end{abstract}

\section{Implementing A Minimalistic Onion Routing Network}
Our basic approach to constructing an Onion Routing Network stemmed from a description of Tor put forward by Hooks and Miles\cite{hook}. We divided the work such that we could develop the cryptographic functionality (to perform encryption and decryption with either public/private key pairs or symmetric keys) and the connectivity functionality (to establish communication between nodes in a network) separately. Consequently, we worked with sockets in C to establish paths on the fly through the network and explored a myriad of different encryption algorithms discussed in class (and implemented in the OpenSSL library). Both avenues had significant learning curves, but the final code is available via a git repository at {\tt https://github.com/seanlaguna/TORHMC}. It is written in C++ and requires OpenSSL 1.0.0e which is available for download at {\tt http://www.openssl.org/source/openssl-1.0.0e.tar.gz}. The README contains instructions for compiling the code (linking to a fresh build of OpenSSL 1.0.0e) and finally running the TorHMC network. It should be noted that we did not intend to duplicate the full functionality of Tor; instead, TorHMC should serve as a testbed for new features that could eventually make their way into a fully-fledged Tor implementation. TorHMC provides a simple interface for testing new features while faithfully replicating the cryptographic and communicative functionality of Tor well enough such that the tests will reflect that new feature's feasibility in Tor itself.

\paragraph{Current Functionality}
Our implementation has three different operators: clients, nodes, and designated exit nodes. A client begins by selecting a random set of available nodes and their corresponding RSA public keys from a comprehensive list of nodes on the network. The client then constructs an initialization onion, encrypted with all public keys. This onion is relayed to the first node in the path, who notes a new socket connection, accepts, and then receives a buffer which it decrypts with its private key.

The header of the unencrypted buffer contains an init struct of information, including the 4-byte IP address of the next node in the path (where to send its data) and a 2-byte port for connecting to it (how to send it there). In addition, it contains a client-generated 256-bit AES symmetric key/iv pair (32 bytes each) for this node to use in encrypting/decrypting future messages. The encrypted size of this init struct is 128 bytes as fixed by the 1024-bit modulus of the public/private key pairs. The plaintext/decrypted size is 70 bytes ($4+2+32+32$). This node then sets up a socket connection with the next node and relays the remainder of the encrypted onion.

This process continues until a the decided exit node is reached, which finishes the process by decrypting the final layer of the onion and extracting the final symmetric key. This process removes the final contents from the buffer, rendering it empty. The path is now set up for symmetric relaying, and the client may put in web addresses to quickly and anonymously ping.

When an address is entered from the client, it is encrypted using AES encryption in cipher-block chaining mode with all of the path's symmetric keys in reverse order. Each node then peels off a layer as it passes through, and the exit node decrypts plaintext containing the web address to ping. Currently, it just issues a system command to do this and checks to see if it errors. The ping's success or failure is then relayed back through the path, encrypted at each layer by the same symmetric keys. The client then peels off all layers with the symmetric keys in forward order to reveal the response. The encrypted size of any message is the message's original size, plus 16, rounded up to the nearest multiple of 16: $s_{cipher} = (((s_{plain}+16)/16)*16)$ for plaintext and cipher sizes $s_{plain}$ and $s_{cipher}$. This is such that the ciphertext size is the plaintext size rounded up to the nearest multiple of 16, but adding 16 bytes if the plaintext size is already a multiple of 16 bytes.

Our method of path selection is currently hard-coded, but we have set up the code in such a way that implementing random path creation should be simple. By storing a list of available nodes, ports, and corresponding public keys, the client could just pick some number of them, generate the necessary symmetric keys, and establish a path.

\paragraph{Unimplemented Features}
The primary challenges that this project presented were
\begin{itemize}
\item Developing an expertise with C sockets
\item Understanding and using the OpenSSL library
\item Combining Chris' C socket code with Sean's C++ OpenSSL encrypt/decrypt functions
\end{itemize}
These challenges arose out of the decision to split our original Roadmap goals into two distinct tracks. Chris adopted the challenge of routing traffic and Sean explored security implementations. We programmed together during the merge.

First, some features in the realm of routing traffic could not be implemented. As is apparent from the description above, our implementation does not attempt to acquire all outgoing Internet traffic to route through the onion network. Further research (including conversations with some professors) revealed this to be far beyond the scope of our project, requiring the client node to somehow monitor at a very low level all system activity, construct an effective packeting scheme for this activity to feed it through the onion network, and the ability of the exit node to emulate an Ethernet connection to relay the outgoing message. In essence, we would have to write the code for setting up a VPN, but have all of the traffic for its operation run through our network. A dissertation maybe, but not a final project. Fortunately, routing actual protocols (like the HTTP or HTTPS protocols) would not make TorHMC any less suitable as an initial testbed for proposed features.

Second, some encryption/decryption security measures and features were not implemented. Unfortunately, encryption/decryption changes the size of the buffer as it moves from node to node. Public/private key encryption requires an input size of less than 86 bytes but fixes the output ciphertext to 128 bytes. This makes it difficult to construct an initialization onion without using public/private keys with different modulo sizes. There are ways to avoid this problem, by breaking the newly-encrypted layers of the onion into small enough chunks and encrypting them in sequence. We propose a simpler method of creating initialization onions: instead of re-encrypting the entire onion when each init struct is added, encrypt each init struct with the appropriate public key and concatenate them in the left-to-right order they will be received. Thus, an onion of 4 init structs contains 512 bytes of data exactly. While it is possible that we could have written code to read multiple chunks of the onion as it grows and carefully pad the layers as they are passed from node to node to fix their size, it was much simpler in practice to restructure our onions in this way. Each layer just strips off its init struct, shifts the buffer 128 bits to the left, and fills the final (and now empty) 128 bits with random data. This method has upsides: it easily extends to dynamic path creation and fixes our buffer size in a simple way. Similarly, symmetric encryption always increases the message size by at least 1 byte and at most 16 bytes, and decryption decreases the message size in a similar way. We did not address this discrepancy, which could serve as a security hole in TorHMC that potentially allows malicious adversaries to fully determine paths in the network. If a programmer wants to use TorHMC to develop features that deter this type of attack, they should fix these imperfections before doing so.

Additionally, to fully replicate Tor functionality, reply onions, hidden servers, and a broadcast server should be implemented as well. These features require additional work on both the routing and cryptography aspects of TorHMC. 

\section{Improving Tor}
A major problem challenging Tor today is the ease with which phishing sites may be anonymously established to emulate other hidden Tor sites. Internal Tor URLs consist of large strings of random characters that are nearly impossible to remember or verify by sight. Furthermore, they change regularly to ensure anonymity. Consequently, malicious adversaries can create a lookalike login to a login-based site and post it on the Internet, masquerading as the actual site. Thus, users attempting to login to the site via some new Tor URL for that site are inherently uncertain if they are logging into the correct site or if a malicious adversary is phishing for their login data (username and password both).

While phishing is essentially impossible to avoid on the first visit to a site (as one is unfamiliar with the site and must of course set up an account before using it, etc.), return visitors should have some form of method for verifying the site's authenticity. To this end, we propose a protocol for establishing a verification ID on first visit that can be reused on return visits. Keeping this interaction anonymous is, of course, the trick, as most such identifying features rely on IP addresses or information about the client machine.

On the first visit to a new login-based site, the user establishes a shared ID with the site that both parties save. The user also sends a reply onion to the site, which associates it with the ID. For security, the user can update the reply onion at each login if desired. Finally, they also negotiate a secret key using a Diffie-Hellman key exchange that can also be updated at each login if desired. Upon future visits, the user sends their ID encrypted with the secret key. Then, the site receives it and responds via reply onion with its official URL encrypted with the secret key. The user decrypts it to become assured that they have made contact with the correct site. Two precautions prevent a man-in-the-middle attack: first, such a malicious adversary would need the secret key to modify any of the message discreetly; second, because the server uses the reply onion to send its message back, the adversary would have to secretly monitor both the original and reply paths to see the entire communication. Even if an adversary does secretly monitor the original path, we see no way for him/her to determine the reply path (specified in the reply onion) and also secretly monitor it.

Our approach does indeed provide the possibility of a man-in-the-middle attack in some senses, but in an onion routing network this seems inescapable. This malicious adversary can only relay encrypted traffic, so he/she cannot garner anything private from the exchange pending his/her ability to break the encryption. Consider a user $U$ wants to visit and log into a site with link $L$, and such an adversary spreads a false link $L_{phish}$ for the site that phishes for entered login data (username and password). Suppose that the regular site $S$ and the phishing site $S_{phish}$ both supposedly support this verification scheme: $S$ actually does, and $S_{phish}$ tries to play man-in-the-middle to pretend like it can perform the same verifications. $S_{phish}$ sends the verification request sent to it to $S$, hoping to intercept the response, modify it, and send it back to $U$ as though it processed the request itself. But, it cannot intercept the request; as soon as $S$ gets the message, it uses the reply onion to relay the information to $U$. If $S_{phish}$ happens to be a node in this reply onion, it will not realize as such because of the properties of an onion routing network. Even if it could intercept the response, it does not have the secret key with which to change the URL embedded in the response, $L_{response} = L$, to its own link $L_{phish}$. It needs to do this so it falsely appears to the user that $L = L_{phish}$, suggesting that the user has visited the right site and can faithfully login. Fortunately, $U$ will still note that the link it visited, $L_{phish}$, is not equivalent to $L_{response} = L$. It can then extract $L$ from the response (even if $S_{phish}$ somehow manages to secretly monitor the communication) and access $L$, repeating the verification process from $L$ if desired. Note that $S_{phish}$ attempts to falsify the verification, but instead actually reveals $S$ to $U$. It seems that $S_{phish}$ is actually better off failing the verification, because at least then $U$ will still not know what the correct link is; $U$ will just know that $L_{phish}$, one of many possible false versions of $L$ phishing for login data, is not the correct link. Even if $S_{phish}$ manages to secretly monitor the response and decides to garble the response instead of just relay it back, the verification process will still fail with no further repercussions. Using modern cryptographic standards such as those specified in OpenSSL1.0.0e, this process should securely verify the revisitation to hidden servers in an onion routing network.

\section{Conclusions}
We have created a robust framework for a minimalistic onion routing network implementation intended to serve as a testbed for new, experimental features for Tor or other functional onion routing networks. We have also proposed strategy for the secure verification of revisitation to hidden servers in an onion routing network. In order to fully test this anti-phishing strategy, some aforementioned extentions to TorHMC must be implemented, including an implementation of our strategy itself. Further work could easily implement these extensions and perform the appropriate verifications.

We intend that this testbed implementation eases the actual process of testing new features such as these. After inspecting the open-source implementation of Tor, we determined that providing the actual implementation of our feature within Tor in order to test it requires complicated additions to the actual Tor network. The motivation for writing TorHMC is that it eliminates some of these complications and allows for reasonable testing of this feature and other possible features that could be proposed for Tor or other functional onion routing networks.

\bibliographystyle{plain}

\begin{thebibliography}{1}

  \bibitem{gold} Goldschlag, David, Michael Reed, and Paul Syverson. ``Onion Routing: for Anonymous and Private Internet Connections." {\it Communications of the ACM} 42.2 (1999). PDF.
  
  \bibitem{tor} {\it Tor Project}: Anonymity Online. Web. 03 Dec. 2011. $<${\tt http://www.torproject.org}$>$.

  \bibitem{hook}  Hooks, Matt, and Jadrian Miles. ``Onion Routing and Online Anonymity." (2006). {\it Duke University}. Web.

  \end{thebibliography}


\end{document}