Skip to content
This repository has been archived by the owner on Apr 22, 2019. It is now read-only.

Code4HR/foodcode

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Food Code Parser

This project parses the FDA Food Code and converts it into plain text and JSON for easier use.

Currently, it doesn't do anything beyond simply pulling the text out of each page and save it to flat files by page number. Better parsing may be done in the future; contributions welcome!

Requirements

  1. Ruby (tested against 2.1.2, but 1.9+ should work)
  2. Bundler (gem install bundler)
  3. Rake (gem install rake)
  4. pdftohtml

On OS X, use Homebrew to install pdftohtml. The pdftohtml package doesn't work with the 2013 food code document, so install Poppler instead: brew install poppler.

On Debian-based Linux distros, the pdftohtml package works fine: apt-get install pdftohtml.

Usage

Before your first use, you'll need to make sure you have the necessary gems installed. From the directory the code is in, run bundle install. (If you aren't using RVM or something similar, you may need to run bundler as root/through sudo.)

Once the gems are installed, simply run rake to download the 2013 PDF, convert it into an XML document, and generate the flat files separated by page.

Releases

No releases published

Packages

No packages published