hermes.html

<h2 id="relevant-reading">Relevant Reading</h2>
<ul>
<li><p><em>Implementing Location Independent Invocation</em>, Black, Andrew P and Artsy, Yeshayahu, IEEE Transactions on Parallel and Distributed Systems, 1990 <span class="citation">(Andrew P Black and Artsy 1990)</span>.</p></li>
<li><p><em>Customizable and extensible deployment for mobile/cloud applications</em>, Zhang, Irene and Szekeres, Adriana and Van Aken, Dana and Ackerman, Isaac and Gribble, Steven D and Krishnamurthy, Arvind and Levy, Henry M, 2014 <span class="citation">(Zhang et al. 2014)</span>.</p></li>
</ul>
<h2 id="commentary">Commentary</h2>
<p>The general idea behind the Remote Procedure Call (RPC) paradigm is that it supports the transfer of control between address spaces. This paradigm allows programmers to write distributed applications without having to have knowledge of data representations or specific network protocols. Even though we know that there is quite a bit semantically different between remote and local calls <span class="citation">(Kendall et al. 1994; Andrew P Black and Artsy 1990)</span>, the authors posit that the most fundamental difference is that of <em>binding</em>, or, how to figure out which address space to direct the call to.</p>
<p>Traditionally, this has been done one of two ways: <em>default or automatic</em> binding where the RPC system makes the choice for the programmer; or <em>clerks</em>, an application specific module used for determining where the place the call. Default binding is fairly straightforward when there is only one server (or a group of semantically equivalent servers) to service the request. Clerks are fairly expensive, as one must be written for each type of request that needs to be serviced. If the service the RPC call is being made to is <em>pure</em>, for instance providing as fast Fourier transform as the authors put it, it is easy to choose automatic binding to select a server based on latency or availability. However, it is more challenging if services host application data. In their example, they consider an employee directory at Digital where application data is partitioned by company, and further by other groupings. If this mapping changes infrequently, a static mapping can be distributed to all of the clients; but, what happens if objects are mobile and this changes more frequently?</p>
<p>One of the fantastic things about this paper is how forward thinking the design is for an actual industrial problem at Digital Equipment Corporation. I consider this one of the early versions of what we now call an “industry” research report, even though the system never was productized and the work was mainly performed by researchers in a lab. The application deals with expense vouchers for employees: each form needs to be filled in by an employee, approved by various managers, filed, and eventually results in a payout of actual cash. Each of the managers that are involved in approving the form may be located in different buildings in different continents. The application design assumes Digital’s global network of 36,000 machines and assumes that centralizing the records for each form in a centralized database is infeasible. Instead, the design is based on mobile objects for both data and code; forms should be able to move around the network as required by the application.</p>
<p>The Hermes system is broken into three components: a naming service, a persistent store known as a collection of <em>storesites</em>, and routing layer that sits above the RPC system. Each object in the system is given a globally unique identifier, a source <em>storesite</em> and a <em>temporal address descriptor</em> or <em>tad</em>. The <em>temporal address descriptor</em> is a pair composed of a Hermes node identifier and a monotonically advancing timestamp: this pair represents where an object is located at a given time. This information is also persisted in the objects <em>storesite</em>. As objects move around the network, the <em>tad</em> is updated at the source node and 2PC used to coordinate a change with the record at the objects’s <em>storesite</em>.</p>
<p>When remote procedure calls are issued, the callee attempts to issue the call locally if the objects is local. If not, and a forwarding pointer, or <em>tad</em> exists, the message is routed to that node. Forwarding pointers are followed a number of times until a maximum hop count is reached; at this point the call is returned to the callee who begins the process again with the last known forwarding pointer. Along the path of forwarding, the <em>tad</em> is updated as each hop occurs, reducing the number of hops needed for the next request through that node. This is possible because of the monotonicity of the temporal addresses.</p>
<p>If a node has no local knowledge of where that object is, either because it is not running locally or because there exists no temporal address, a request is made to the naming service to request the <em>storesite</em> for the object, and the address of the current location retrieved from the <em>storesite</em>.</p>
<p>However, in this model failures may occur. If the RPC arrives at the destination of the object and the call invoked and completed, but the response packets dropped, what happens? In this case, an invocation sequencer is required to ensure that the operation only performed if it has not previously completed. The authors suggest developers write operations that are idempotent, to ensure they can be replayed without issue or additional overhead.</p>
<h2 id="impact-and-implementations">Impact and Implementations</h2>
<p>Both the Eden <span class="citation">(Andrew P. Black 1985)</span> and Emerald <span class="citation">(Andrew P Black et al. 2007)</span> programming languages both had notions of distributed objects. Eden used hints to identify where to route messages for objects, but timed them out quickly. Once timed out, a durable storage location called a <em>checksite</em> would be checked, and if that yielded no results, broadcast messages would be used. Emerald, a predecessor to Hermes, used forwarding addresses, but used a broadcast mechanism to find objects when forwarding addresses were not available. In the event the broadcast yielded no results, an exhaustive search of every node in the cluster was performed. All of these decisions were fine for a language and operating system designed mainly for research.</p>
<p>Emerald was more advanced in several ways. Emerald’s type system allowed for the introduction of new types of objects, whereas the Hermes system assumed at system start all possible object types were known to the system. Emerald could also migrate processes during invocation, something that the Hermes system could not.</p>
<p>While the system could tolerate some notion of failures while following forwarding addresses, by resorting to usage of the information located at the <em>storesite</em>, the system had no way to prevent issues with partitions: where an invocation may fail because the object is inaccessible. However, given the relative independence of objects in the system, this would only affect objects (or users) located on the partitioned machine.</p>
<p>The design of Hermes was completed in a year and a half, written in Modula-2+, and was demonstrated functional in the laboratory with a LAN composed of a small number of nodes. According to one of the authors of the paper, the system never was turned into a product, mainly because Digital did not have a team at the time responsible for turning advanced research projects into actual distributed systems products<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>.</p>
<p>The Sapphire <span class="citation">(Zhang et al. 2014)</span> system presented at OSDI ’14 bears a similar resemblance to the Hermes system and its Emerald roots. While Sapphire focuses on the separation of application logic from deployment logic through the use of interfaces and interface inheritance in object-oriented programming languages, Sapphire uses many of the techniques presented in both Emerald and Hermes: transparent relocation based on annotations or for load balancing; location independent method invocation through the use of forwarding pointers, and fallback to a persistent data store to find the canonical location of a particular object.</p>
<p>Today, idempotence <span class="citation">(Helland 2012)</span> has been a topic of study in distributed systems, as it assists in designing deterministic computations that must happen on unreliable, asynchronous networks; a place where it is impossible to reliably detect failures <span class="citation">(Fischer, Lynch, and Paterson 1985)</span>. Shapiro <em>et al.</em> <span class="citation">(Shapiro et al. 2011)</span> propose the use of data structures that are associative, commutative, and idempotent as the basis for shared state in distributed databases. Meiklejohn and Van Roy <span class="citation">(Meiklejohn and Van Roy 2015)</span> propose similar for large-scale distributed computations; whereas Conway <em>et al.</em> <span class="citation">(Conway et al. 2012)</span> propose similar for protocol development. Lee <em>et al.</em>  propose a system called RIFL for ensuring exactly-once semantics for remote procedure calls by uniquely identifying each call and fault-tolerant storage of the results <span class="citation">(Lee et al. 2015)</span>.</p>
<div id="refs" class="references">
<div id="ref-black1990implementing">
<p>Black, Andrew P, and Yeshayahu Artsy. 1990. “Implementing Location Independent Invocation.” <em>Parallel and Distributed Systems, IEEE Transactions on</em> 1 (1). IEEE: 107–19.</p>
</div>
<div id="ref-black2007development">
<p>Black, Andrew P, Norman C Hutchinson, Eric Jul, and Henry M Levy. 2007. “The Development of the Emerald Programming Language.” In <em>Proceedings of the Third Acm Sigplan Conference on History of Programming Languages</em>, 11–11. ACM.</p>
</div>
<div id="ref-Black:1985:SDA:323647.323646">
<p>Black, Andrew P. 1985. “Supporting Distributed Applications: Experience with Eden.” In <em>Proceedings of the Tenth Acm Symposium on Operating Systems Principles</em>, 181–93. SOSP ’85. New York, NY, USA: ACM. doi:<a href="https://doi.org/10.1145/323647.323646">10.1145/323647.323646</a>.</p>
</div>
<div id="ref-conway2012logic">
<p>Conway, Neil, William R Marczak, Peter Alvaro, Joseph M Hellerstein, and David Maier. 2012. “Logic and Lattices for Distributed Programming.” In <em>Proceedings of the Third Acm Symposium on Cloud Computing</em>, 1. ACM.</p>
</div>
<div id="ref-fischer1985impossibility">
<p>Fischer, Michael J, Nancy A Lynch, and Michael S Paterson. 1985. “Impossibility of Distributed Consensus with One Faulty Process.” <em>Journal of the ACM (JACM)</em> 32 (2). ACM: 374–82.</p>
</div>
<div id="ref-Helland:2012:IMC:2181796.2187821">
<p>Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” <em>Queue</em> 10 (4). New York, NY, USA: ACM: 30:30–30:46. doi:<a href="https://doi.org/10.1145/2181796.2187821">10.1145/2181796.2187821</a>.</p>
</div>
<div id="ref-kendall1994note">
<p>Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” Sun Microsystems, Inc.</p>
</div>
<div id="ref-lee2015implementing">
<p>Lee, Collin, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” In <em>Proceedings of the 25th Symposium on Operating Systems Principles</em>, 71–86. ACM.</p>
</div>
<div id="ref-meiklejohn2015lasp">
<p>Meiklejohn, Christopher, and Peter Van Roy. 2015. “Lasp: A Language for Distributed, Eventually Consistent Computations with Crdts.” In <em>Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data</em>, 7. ACM.</p>
</div>
<div id="ref-shapiro2011comprehensive">
<p>Shapiro, Marc, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. “A Comprehensive Study of Convergent and Commutative Replicated Data Types.” PhD thesis, Inria–Centre Paris-Rocquencourt.</p>
</div>
<div id="ref-zhang2014customizable">
<p>Zhang, Irene, Adriana Szekeres, Dana Van Aken, Isaac Ackerman, Steven D Gribble, Arvind Krishnamurthy, and Henry M Levy. 2014. “Customizable and Extensible Deployment for Mobile/Cloud Applications.” In <em>11th Usenix Symposium on Operating Systems Design and Implementation (Osdi 14)</em>, 97–112.</p>
</div>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Andrew P. Black, personal communication.<a href="#fnref1">↩</a></p></li>
</ol>
</div>