This project was created to document the creation of a PHP client and server communicating via the Thrift framework. To better learn and illustrate the use of Thrift, I wanted to build a simple, but non-trivial, mini-application that performed tasks that might actually be useful.
This project defines a client that passes raw text to a server. The server uses the Zemanta contextual intelligence engine to analyze the text for key topics and categorizations of the content. The article and its topics of interest are categorized using the DMOZ and Freebase taxonomies. The communication between client and server, of course, happens via a Thrift interface.
Directly from its project site in the Apache Incubator, Thrift:
Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
Simply stated, Thrift is an RPC framework analogous to SOAP. It offers cross-language serialization capabilities, but with significantly lower overhead because Thift uses a binary format.
While not exactly new, is still new enough that documentation is scarce and difficult to piece together. You might be okay if you're interested in creating a Java or Python thrift server. Similarly, there are a number of PHP clients documented, but if you're looking to stand up a PHP-based Thrift server, you're mostly shit out of luck. Until now.
This has been documented elsewhere so I won't belabor things here. For OS X, I used Matt Mueller's Guide to install Thrift 0.4.0 and had no trouble at all.
This repository contains all of the resources required to standup and test the project's functionality. In true IKEA fashion, though, some configuration is required. More on that in a bit.
-
client/
A PHP Thrift client specifically built to test the server implementation. This directory also includes a number of sample text files that will be analyzed randomly when the client app is loaded into a browser.
-
gen-php/
The PHP server and client bindings generated by Thrift (i.e.
thrift -r --gen php:server zemanta.thrift
). Additional clients for other supported languages can be generated as needed. -
lib/
The Thrift internals required to support the client and server bindings. These are created during the Thrift installation and have been copied into the project for convenience. In an enterprise solution, these would likely be linked directly from the Thrift installation directory, i.e.
<thrift_install_path>/lib
. Only the PHP bindings are actually required, but the others were included in case I ever decide to play with alternate language support. -
server/
The Thrift server created atop the internals and exposed via HTTP for client access.
-
zemanta.thrift
The Thrift interface definition file.
As mentioned above, all of the bits are in place already, but some environmental configuration is required.
Before any of this can be wired up and operational, you'll need a Zemanta API key. The key gives you access to 1000 API calls per day by default. No key is required to use the Freebase API. This project only reads from Freebase; reads are throttled at 100,000 per day.
Providing detailed instructions for installing and/or configuring a web server are beyond the scope of this project. If you're here and still reading, I'm just going to assume that doing this doesn't scare the bejesus out of you and stick to the highlights of my own configuration. If your bejesus is firmly intact, you'll understand how to tailor these instructions to match your own environment.
I run a local Apache server on OS X with named virtual hosts turned on. I have one host pointing to the client/
directory and another pointing to the server/
directory. For convenience, I added DirectoryIndex
directives for client.php
and server.php
, respectively.
#
# SERVER
#
<VirtualHost *:80>
ServerName thrift.php.server.local
ServerAlias thrift.php.server thrift.php.server.dev thrift.php.server.dev.local
DocumentRoot /var/www/thrift-example/server
DirectoryIndex server.php
ErrorLog /var/www/.logs/thrift.php.server/error_log
CustomLog /var/www/.logs/thrift.php.server/access_log common
<Directory /var/www/thrift-example/server>
Options FollowSymLinks Indexes
AllowOverride None
Order deny,allow
Allow from all
</Directory>
</VirtualHost>
#
# CLIENT
#
<VirtualHost *:80>
ServerName thrift.php.client.local
ServerAlias thrift.php.client thrift.php.client.dev thrift.php.client.dev.local
DocumentRoot /var/www/thrift-example/client
DirectoryIndex client.php
ErrorLog /var/www/.logs/thrift.php.client/error_log
CustomLog /var/www/.logs/thrift.php.client/access_log common
<Directory /var/www/thrift-example/client>
Options FollowSymLinks Indexes
AllowOverride None
Order deny,allow
Allow from all
</Directory>
</VirtualHost>
Configuring the server and client is as simple as copying the config.sample.php
file in each to config.php
and updating the values appropriately.
The server requires one additional change. You'll have to copy zemanta_service.sample.php
to zemanta_service.php
and update the API_KEY
class constant to the value of your Zemanta key:
const API_KEY = 'YOUR_ZEMANTA_API_KEY_HERE';