-
Notifications
You must be signed in to change notification settings - Fork 4
/
hermes.tex
36 lines (20 loc) · 8.83 KB
/
hermes.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
\subsection{Relevant Reading}
\begin{itemize}
\item \textit{Implementing Location Independent Invocation}, Black, Andrew P and Artsy, Yeshayahu, IEEE Transactions on Parallel and Distributed Systems, 1990~\cite{black1990implementing}.
\item \textit{Customizable and extensible deployment for mobile/cloud applications}, Zhang, Irene and Szekeres, Adriana and Van Aken, Dana and Ackerman, Isaac and Gribble, Steven D and Krishnamurthy, Arvind and Levy, Henry M, 2014~\cite{zhang2014customizable}.
\end{itemize}
\subsection{Commentary}
The general idea behind the Remote Procedure Call (RPC) paradigm is that it supports the transfer of control between address spaces. This paradigm allows programmers to write distributed applications without having to have knowledge of data representations or specific network protocols. Even though we know that there is quite a bit semantically different between remote and local calls~\cite{kendall1994note, black1990implementing}, the authors posit that the most fundamental difference is that of \textit{binding}, or, how to figure out which address space to direct the call to.
Traditionally, this has been done one of two ways: \textit{default or automatic} binding where the RPC system makes the choice for the programmer; or \textit{clerks}, an application specific module used for determining where the place the call. Default binding is fairly straightforward when there is only one server (or a group of semantically equivalent servers) to service the request. Clerks are fairly expensive, as one must be written for each type of request that needs to be serviced. If the service the RPC call is being made to is \textit{pure}, for instance providing as fast Fourier transform as the authors put it, it is easy to choose automatic binding to select a server based on latency or availability. However, it is more challenging if services host application data. In their example, they consider an employee directory at Digital where application data is partitioned by company, and further by other groupings. If this mapping changes infrequently, a static mapping can be distributed to all of the clients; but, what happens if objects are mobile and this changes more frequently?
One of the fantastic things about this paper is how forward thinking the design is for an actual industrial problem at Digital Equipment Corporation. I consider this one of the early versions of what we now call an ``industry'' research report, even though the system never was productized and the work was mainly performed by researchers in a lab. The application deals with expense vouchers for employees: each form needs to be filled in by an employee, approved by various managers, filed, and eventually results in a payout of actual cash. Each of the managers that are involved in approving the form may be located in different buildings in different continents. The application design assumes Digital's global network of 36,000 machines and assumes that centralizing the records for each form in a centralized database is infeasible. Instead, the design is based on mobile objects for both data and code; forms should be able to move around the network as required by the application.
The Hermes system is broken into three components: a naming service, a persistent store known as a collection of \textit{storesites}, and routing layer that sits above the RPC system. Each object in the system is given a globally unique identifier, a source \textit{storesite} and a \textit{temporal address descriptor} or \textit{tad}. The \textit{temporal address descriptor} is a pair composed of a Hermes node identifier and a monotonically advancing timestamp: this pair represents where an object is located at a given time. This information is also persisted in the objects \textit{storesite}. As objects move around the network, the \textit{tad} is updated at the source node and 2PC used to coordinate a change with the record at the objects's \textit{storesite}.
When remote procedure calls are issued, the callee attempts to issue the call locally if the objects is local. If not, and a forwarding pointer, or \textit{tad} exists, the message is routed to that node. Forwarding pointers are followed a number of times until a maximum hop count is reached; at this point the call is returned to the callee who begins the process again with the last known forwarding pointer. Along the path of forwarding, the \textit{tad} is updated as each hop occurs, reducing the number of hops needed for the next request through that node. This is possible because of the monotonicity of the temporal addresses.
If a node has no local knowledge of where that object is, either because it is not running locally or because there exists no temporal address, a request is made to the naming service to request the \textit{storesite} for the object, and the address of the current location retrieved from the \textit{storesite}.
However, in this model failures may occur. If the RPC arrives at the destination of the object and the call invoked and completed, but the response packets dropped, what happens? In this case, an invocation sequencer is required to ensure that the operation only performed if it has not previously completed. The authors suggest developers write operations that are idempotent, to ensure they can be replayed without issue or additional overhead.
\subsection{Impact and Implementations}
Both the Eden~\cite{Black:1985:SDA:323647.323646} and Emerald~\cite{black2007development} programming languages both had notions of distributed objects. Eden used hints to identify where to route messages for objects, but timed them out quickly. Once timed out, a durable storage location called a \textit{checksite} would be checked, and if that yielded no results, broadcast messages would be used. Emerald, a predecessor to Hermes, used forwarding addresses, but used a broadcast mechanism to find objects when forwarding addresses were not available. In the event the broadcast yielded no results, an exhaustive search of every node in the cluster was performed. All of these decisions were fine for a language and operating system designed mainly for research.
Emerald was more advanced in several ways. Emerald's type system allowed for the introduction of new types of objects, whereas the Hermes system assumed at system start all possible object types were known to the system. Emerald could also migrate processes during invocation, something that the Hermes system could not.
While the system could tolerate some notion of failures while following forwarding addresses, by resorting to usage of the information located at the \textit{storesite}, the system had no way to prevent issues with partitions: where an invocation may fail because the object is inaccessible. However, given the relative independence of objects in the system, this would only affect objects (or users) located on the partitioned machine.
The design of Hermes was completed in a year and a half, written in Modula-2+, and was demonstrated functional in the laboratory with a LAN composed of a small number of nodes. According to one of the authors of the paper, the system never was turned into a product, mainly because Digital did not have a team at the time responsible for turning advanced research projects into actual distributed systems products\footnote{Andrew P. Black, personal communication.}.
The Sapphire~\cite{zhang2014customizable} system presented at OSDI '14 bears a similar resemblance to the Hermes system and its Emerald roots. While Sapphire focuses on the separation of application logic from deployment logic through the use of interfaces and interface inheritance in object-oriented programming languages, Sapphire uses many of the techniques presented in both Emerald and Hermes: transparent relocation based on annotations or for load balancing; location independent method invocation through the use of forwarding pointers, and fallback to a persistent data store to find the canonical location of a particular object.
Today, idempotence~\cite{Helland:2012:IMC:2181796.2187821} has been a topic of study in distributed systems, as it assists in designing deterministic computations that must happen on unreliable, asynchronous networks; a place where it is impossible to reliably detect failures~\cite{fischer1985impossibility}. Shapiro \textit{et al.}~\cite{shapiro2011comprehensive} propose the use of data structures that are associative, commutative, and idempotent as the basis for shared state in distributed databases. Meiklejohn and Van Roy~\cite{meiklejohn2015lasp} propose similar for large-scale distributed computations; whereas Conway \textit{et al.}~\cite{conway2012logic} propose similar for protocol development. Lee \textit{et al.}~ propose a system called RIFL for ensuring exactly-once semantics for remote procedure calls by uniquely identifying each call and fault-tolerant storage of the results~\cite{lee2015implementing}.