Skip to content

Replica Gateways

Jude Nelson edited this page Dec 24, 2013 · 3 revisions

Overview

Replica Gateways (RGs) are proxies that interface with cloud storage of your choosing. They take data written by UGs and put it into cloud storage, so it can be fetched by other UGs if the writer UG goes offline or becomes unreachable.

Closures

Replica Gateways are programmable in that they let you create closures that let you define how they handle data written by User Gateways in the same Volume. As the name implies, a closure contains not only the code needed to run, but also a set of global variables (state) that define how the code behaves.

Closures are implemented as specially-crafted Python modules. With a closure, you can deploy a new driver for a cloud storage provider, specify a (driver-agnostic) replication policy, alter an RG's configuration, and securely install sensitive data (such as API keys).

Creating a (stub) Closure

To create a stub enclosure that does nothing, first execute the following commands:

$ mkdir foo
$ touch foo/__init__.py
$ touch foo/replica.py
$ touch foo/driver.py
$ touch foo/secrets.py
$ touch foo/config.py

foo defines a Python module, and it must have all of the above files, with exactly those names. Moreover, each file has a specific structure that must be followed, as outlined below. You will be unable to upload the closure to the RG unless it meets this structure.

The contents of foo/config.py should be set to:

#!/usr/bin/env python

CONFIG = {
   # Put your global, public configuration data here
}

The contents of foo/secrets.py should be set to:

#!/usr/bin/env python

SECRETS = {
   # Put your secret, RG-specific configuration data here.
   # Definitely put any API keys and accounting information here.
}

The contents of foo/driver.py must be set to:

#!/usr/bin/env python

# The following methods, with the names and signatures as given,
# must all be defined.

def read_file( filename, outfile, **kw ):
   """
      Read a file named by `filename` from storage.
      `filename` is guaranteed to be globally unique.
      Write the contents to `outfile` (a file-like object).
   """
   pass

def write_file( filename, infile, **kw ):
   """
      Write a file named by `filename` to storage.
      `filename` is guaranteed to be globally unique.
      Read the contents from `infile` (a file-like object).
   """
   pass

def delete_file( filename, **kw ):
   """
      Delete a file named by `filename`.
      `filename` is guaranteed to be globally unique.
   """
   pass

The contents of foo/replica.py should be set to:

#!/usr/bin/env python

# The following methods, with the names and signatures as given,
# must all be defined.

def replica_read( context, request_info, filename, outfile ):
   """
      Execute storage policy logic for reading data here.
      Before returning, pick a driver from context.drivers (a dict)
      and call its read_file() method, passing it `filename` and `outfile`.

      Return an HTTP status code indicating the result of the operation.
   """
   pass

def replica_write( context, request_info, filename, infile ):
   """
      Execute storage policy logic for writing data here.
      Before returning, pick a driver from context.drivers (a dict)
      and call its write_file() method, passing it `filename` and `infile`.
      
      Return an HTTP status code indicating the result of the operation.
   """
   pass

def replica_delete( context, request_info, filename ):
   """
      Execute storage policy logic for deleting data here.
      Before returning, pick a driver from context.drivers (a dict)
      and call its delete_file() method, passing it `filename` and `infile`.

      Return an HTTP status code indicating the result of the operation.
   """
   pass
Semantics of replica.py

The point of replica.py is to define in a cloud-agnostic fashion the policies that should be enforced. The replica_read, replica_write, and replica_delete methods are callbacks that will be invoked by the RG when the UG attempts to read, write, or delete data (respectively).

Your replica.py file SHOULD NOT contain cloud-specific code. Instead, it should call upon the appropriate cloud storage driver to access the cloud directly.

Methods in replica.py MUST be idempotent. The RG guarantees that the filename argument is globally unique. Both the filename and the contents in infile are opaque, and SHOULD NOT be accessed by replica.py.

Any imports must be performed within the method bodies.

context and request_info

Each method in replica.py takes a context and request_info named tuple as arguments. The context tuple defines a bundle of state describing the RG, and the request_info contains information on the nature of the given request.

The context object contains the following information:

context.config    # This is the CONFIG dictionary from config.py.
context.secrets   # This is the SECRETS dictionary from secrets.py.
context.log       # This is the RG's logger instance.  You can use it for logging information.
context.drivers   # This is a dictionary of modules that meet the driver.py interface.

The context.drivers dictionary maps module names to drivers. The driver.py file is always keyed with "builtin". A driver imported locally from module foo.bar.baz when the RG starts up will be keyed with "foo.bar.baz".

The request_info object is a named tuple that describes the request. Depending on the method requested (replica_read, replica_write, or replica_delete) and type of data requested (block or file manifest), it will contain different information.

A request_info structure always contains the following information:

request_info.type      # (int) MANIFEST or BLOCK (defined in syndicate.rg.request.RequestInfo)
request_info.user_id   # (int) ID of the user that is running the UG.
request_info.file_id   # (int) ID of the file
request_info.version   # (int) version of the file

If a block was requested, request_info contains the following additional information:

request_info.block_id  # (int) ID of the block requested
request_info.block_version  # (int) version of the block requested

If a manifest was requested, request_info contains the following additional information:

request_info.mtime_sec # (int) modification time of the file (seconds)
request_info.mtime_nsec # (int) modification time of the file (nanoseconds)

Orthogonal to whether or not a block or manifest was requested, the following information is also available on a call to replica_write:

request_info.user_id    # (int) the ID of the user that made the request
request_info.gateway_id # (int) the ID of the gateway that made the request
request_info.data_hash  # (str) printable SHA256 of the data uploaded
request_info.size       # (int) number of bytes uploaded
request_info.kwargs     # (dict) extended attributes on the file

On a call to replica_delete, the following additional information is available in request_info:

request_info.user_id    # (int) the ID of the user that made the request
request_info.gateway_id # (int) the ID of the gateway that made the request
request_info.kwargs     # (dict) extended attributes on the file

Of particular note is the request_info.kwargs dictionary. You can set arbitrary extended attributes on a file in the UG, and they will be sent to the RG when the file's data is replicated. This lets you pass arbitrary information from your application to the RG (NOTE: this bit is not yet implemented --Jude).

Semantics of driver.py

The driver.py file defines how the closure interacts with the cloud. The methods in this file are meant to be invoked by the methods in the replica.py file, and SHOULD NOT enforce higher-level policies.

Each method in driver.py takes an arbitrary set of keyword arguments. The replica_* methods SHOULD pass whatever information is necessary for the driver to run via keyword arguments.

Driver methods MUST be idempotent.

Any imports must be performed within the method bodies.

Semantics of config.py

The config.py file MUST have a CONFIG dictionary. The contents MAY be publicly readable, so developers SHOULD NOT store sensitive information in them. The CONFIG dictionary is read-only, and not directly accessible via replica.py or driver.py.

Semantics of secrets.py

The secrets.py file MUST have a SECRETS dictionary. The contents of the SECRETS dictionary can be stored in the clear locally. The syntool.py program, when it uploads the closure to the RG, will first encrypt the contents with the RG's public key. The RG will decrypt the SECRETS dictionary when it receives the closure.

Example Closure

Here is a complete sample closure that stores everything to a local directory:

config.py:

#!/usr/bin/python

CONFIG = {
   "STORAGE_DIR": "~/.syndicate-storage"
}

secrets.py:

#!/usr/bin/python

SECRETS = {}

replica.py:

#!/usr/bin/python

def replica_read( context, request_info, filename, outfile ):
   return context.drivers['builtin'].read_file( filename, outfile, context=context )

def replica_write( context, request_info, filename, infile ):
   return context.drivers['builtin'].write_file( filename, infile, context=context )

def replica_delete( context, request_info, filename ):
   return context.drivers['builtin'].delete_file( filename, context=context )

driver.py:

#!/usr/bin/python

def read_file( filename, outfile, **kw ):
   import traceback
   import os

   context = kw['context'] 
   STORAGE_DIR = context.config['STORAGE_DIR']
   
   try:
      fd = open( os.path.join(STORAGE_DIR, filename), "r" )
      outfile.write( fd.read() )
      fd.close()
   except Exception, e:
      context.log.exception(e)
      return 500
   
   return 200



def write_file( filename, infile, **kw ):
   import traceback

   buf = infile.read()
   
   context = kw['context']
   STORAGE_DIR = context.config['STORAGE_DIR']
   
   try:
      fd = open( os.path.join(STORAGE_DIR, filename), "w" )
      fd.write( buf )
      fd.close()
   except Exception, e:
      context.log.exception(e)
      return 500
   
   return 200


def delete_file( filename, **kw ):
   import traceback
   import os

   context = kw['context']   
   STORAGE_DIR = context.config['STORAGE_DIR']
   
   try:
      os.unlink( os.path.join(STORAGE_DIR, filename) )
   except Exception, e:
      context.log.exception(e)
      return 500
   
   return 200