Skip to content

Latest commit

 

History

History
104 lines (66 loc) · 6.05 KB

File metadata and controls

104 lines (66 loc) · 6.05 KB

This project was created to document the creation of a PHP client and server communicating via the Thrift framework. To better learn and illustrate the use of Thrift, I wanted to build a simple, but non-trivial, mini-application that performed tasks that might actually be useful.

This project defines a client that passes raw text to a server. The server uses the Zemanta contextual intelligence engine to analyze the text for key topics and categorizations of the content. The article and its topics of interest are categorized using the DMOZ and Freebase taxonomies. The communication between client and server, of course, happens via a Thrift interface.

Thrift

Directly from its project site in the Apache Incubator, Thrift:

Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.

Simply stated, Thrift is an RPC framework analogous to SOAP. It offers cross-language serialization capabilities, but with significantly lower overhead because Thift uses a binary format.

While not exactly new, is still new enough that documentation is scarce and difficult to piece together. You might be okay if you're interested in creating a Java or Python thrift server. Similarly, there are a number of PHP clients documented, but if you're looking to stand up a PHP-based Thrift server, you're mostly shit out of luck. Until now.

Installing Thrift

This has been documented elsewhere so I won't belabor things here. For OS X, I used Matt Mueller's Guide to install Thrift 0.4.0 and had no trouble at all.

Project Contents

This repository contains all of the resources required to standup and test the project's functionality. In true IKEA fashion, though, some configuration is required. More on that in a bit.

  • client/

    A PHP Thrift client specifically built to test the server implementation. This directory also includes a number of sample text files that will be analyzed randomly when the client app is loaded into a browser.

  • gen-php/

    The PHP server and client bindings generated by Thrift (i.e. thrift -r --gen php:server zemanta.thrift). Additional clients for other supported languages can be generated as needed.

  • lib/

    The Thrift internals required to support the client and server bindings. These are created during the Thrift installation and have been copied into the project for convenience. In an enterprise solution, these would likely be linked directly from the Thrift installation directory, i.e. <thrift_install_path>/lib. Only the PHP bindings are actually required, but the others were included in case I ever decide to play with alternate language support.

  • server/

    The Thrift server created atop the internals and exposed via HTTP for client access.

  • zemanta.thrift

    The Thrift interface definition file.

Project Installation and Configuration

As mentioned above, all of the bits are in place already, but some environmental configuration is required.

Services

Before any of this can be wired up and operational, you'll need a Zemanta API key. The key gives you access to 1000 API calls per day by default. No key is required to use the Freebase API. This project only reads from Freebase; reads are throttled at 100,000 per day.

Web Server

Providing detailed instructions for installing and/or configuring a web server are beyond the scope of this project. If you're here and still reading, I'm just going to assume that doing this doesn't scare the bejesus out of you and stick to the highlights of my own configuration. If your bejesus is firmly intact, you'll understand how to tailor these instructions to match your own environment.

I run a local Apache server on OS X with named virtual hosts turned on. I have one host pointing to the client/ directory and another pointing to the server/ directory. For convenience, I added DirectoryIndex directives for client.php and server.php, respectively.

# 
# SERVER
# 
<VirtualHost *:80>
  ServerName     thrift.php.server.local
  ServerAlias    thrift.php.server thrift.php.server.dev thrift.php.server.dev.local
  DocumentRoot   /var/www/thrift-example/server
  DirectoryIndex server.php
  
  ErrorLog     /var/www/.logs/thrift.php.server/error_log
  CustomLog    /var/www/.logs/thrift.php.server/access_log common

  <Directory /var/www/thrift-example/server>
    Options FollowSymLinks Indexes
    AllowOverride None
    Order deny,allow
    Allow from all
  </Directory>
</VirtualHost>

# 
# CLIENT
# 
<VirtualHost *:80>
	ServerName     thrift.php.client.local
	ServerAlias    thrift.php.client thrift.php.client.dev thrift.php.client.dev.local 
	DocumentRoot   /var/www/thrift-example/client
	DirectoryIndex client.php

	ErrorLog     /var/www/.logs/thrift.php.client/error_log
	CustomLog    /var/www/.logs/thrift.php.client/access_log common

	<Directory /var/www/thrift-example/client>
          Options FollowSymLinks Indexes
          AllowOverride None
          Order deny,allow
          Allow from all
	</Directory>
</VirtualHost>

Server & Client

Configuring the server and client is as simple as copying the config.sample.php file in each to config.php and updating the values appropriately.

The server requires one additional change. You'll have to copy zemanta_service.sample.php to zemanta_service.php and update the API_KEY class constant to the value of your Zemanta key:

const API_KEY = 'YOUR_ZEMANTA_API_KEY_HERE';