-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 2ad452e
Showing
10 changed files
with
3,321 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Copyright (c) 2015 The New York Times Company | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this library except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# archieml | ||
|
||
Parse Archie Markup Language (ArchieML) documents into JavaScript objects. | ||
|
||
Read about the ArchieML specification at [archieml.org](http://archieml.org). | ||
|
||
The current version is `v0.1.0`. | ||
|
||
## Installation | ||
|
||
`npm install archieml` | ||
|
||
## Usage | ||
|
||
``` | ||
<script src="archieml.js"></script> | ||
<script type="text/javascript"> | ||
var parsed = archieml.load("key: value"); | ||
>> {"key": "value"} | ||
</script> | ||
``` | ||
|
||
``` | ||
var archieml = require('archieml'); | ||
var parsed = archieml.load("key: value"); | ||
>> {"key": "value"} | ||
``` | ||
|
||
### Using with Google Documents | ||
|
||
We use `archieml` at The New York Times to parse Google Documents containing AML. This requires a little upfront work to download the document and convert it into text that `archieml` can load. | ||
|
||
The first step is authenticating with the Google Drive API, and accessing the document. For this, you will need a user account that is authorized to view the document you wish to download. | ||
|
||
For this example, I'm going to use a simple node app using Google's official `googleapis` npm package, but you can use another library or authentication method if you like. Whatever mechanism, you'll need to be able to export the document either as text, or html, and then run some of the post-processing listed in the example file at [`examples/google_drive.js`](https://github.com/newsdev/archieml-js/blob/master/examples/google_drive.js). | ||
|
||
You will need to set up a Google API application in order to authenticate yourself. Full instructions are available [here](https://developers.google.com/accounts/docs/OpenIDConnect#appsetup). When you create your Client ID, you should list `http://127.0.0.1:3000` as an authorized origin, and `http://127.0.0.1:3000/oauth2callback` as the callback url. | ||
|
||
Then open up `examples/google_drive.js` and enter the CLIENT_ID and CLIENT_SECRET from the API account you created. And then run the server: | ||
|
||
``` | ||
$ npm install archieml | ||
$ npm install express | ||
$ npm install googleapis | ||
$ npm install htmlparser2 | ||
$ npm install html-entities | ||
$ node examples/google_drive.js | ||
``` | ||
|
||
You should then be able to go to `http://127.0.0.1/KEY`, where `KEY` is the file id of the Google Drive document you want to parse. Make sure that the account you created has access to that document. | ||
|
||
You can use a test document to start that's public to everyone. It will ask you to authenticate your current session, and then will return back a json representation of the document. View the source of [`examples/google_drive.js`](https://github.com/newsdev/archieml-js/blob/master/examples/google_drive.js) for step by step instructions on what's being done. | ||
|
||
[`http://127.0.0.1:3000/1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s`](http://127.0.0.1:3000/1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s) | ||
|
||
## Changelog | ||
|
||
* `0.1.0` - Initial release supporting the first version of the ArchieML spec, published [2015-03-06](http://archieml.org/spec/1.0/CR-20150306.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,252 @@ | ||
'use strict'; | ||
|
||
// Structure inspired by John Resig's HTML parser | ||
// http://ejohn.org/blog/pure-javascript-html-parser/ | ||
|
||
(function() { | ||
|
||
// The load function takes a string of text as its only argument. | ||
// It then proceeds to match the text to one of several regular expressions | ||
// which match patterns for different types of commands in AML. | ||
function load(input) { | ||
var nextLine = new RegExp('.*((\r|\n)+)'); | ||
var startKey = new RegExp('^\\s*([A-Za-z0-9-_\.]+)[ \t\r]*:[ \t\r]*(.*)'); | ||
var commandKey = new RegExp('^\\s*:[ \t\r]*(endskip|ignore|skip|end)', 'i'); | ||
var arrayElement = new RegExp('^\\s*\\*[ \t\r]*(.*)'); | ||
var scopePattern = new RegExp('^\\s*(\\[|\\{)[ \t\r]*([A-Za-z0-9-_\.]*)[ \t\r]*(?:\\]|\\})[ \t\r]*.*?(\n|\r|$)'); | ||
|
||
var data = {}, | ||
scope = data, | ||
|
||
bufferScope = null, | ||
bufferKey = null, | ||
bufferString = '', | ||
|
||
isSkipping = false, | ||
|
||
array = null, | ||
arrayType = null, | ||
arrayFirstKey = null; | ||
|
||
while (input) { | ||
// Inside the input stream loop, the `input` string is trimmed down as matches | ||
// are found, and fires a call to the matching parse*() function. | ||
var match; | ||
|
||
if (commandKey.exec(input)) { | ||
match = commandKey.exec(input); | ||
|
||
parseCommandKey(match[1].toLowerCase()); | ||
|
||
} else if (!isSkipping && startKey.exec(input) && (!array || arrayType !== 'simple')) { | ||
match = startKey.exec(input); | ||
|
||
parseStartKey(match[1], match[2] || ''); | ||
|
||
} else if (!isSkipping && arrayElement.exec(input) && array && arrayType !== 'complex') { | ||
match = arrayElement.exec(input); | ||
|
||
parseArrayElement(match[1]); | ||
|
||
} else if (!isSkipping && scopePattern.exec(input)) { | ||
match = scopePattern.exec(input); | ||
|
||
parseScope(match[1], match[2]); | ||
|
||
} else if (nextLine.exec(input)) { | ||
match = nextLine.exec(input); | ||
|
||
bufferString += input.substring(0, match[0].length); | ||
|
||
} else { | ||
// End of document reached | ||
input = ''; | ||
} | ||
|
||
if (match) input = input.substring(match[0].length); | ||
} | ||
|
||
// The following parse functions add to the global `data` object and update | ||
// scoping variables to keep track of what we're parsing. | ||
|
||
function parseStartKey(key, restOfLine) { | ||
// When a new key is encountered, the rest of the line is immediately added as | ||
// its value, by calling `flushBuffer`. | ||
flushBuffer(); | ||
|
||
// Special handling for arrays. If this is the start of the array, remember | ||
// which key was encountered first. If this is a duplicate encounter of | ||
// that key, start a new object. | ||
if (array) { | ||
// If we're within a simple array, ignore | ||
arrayType = arrayType || 'complex'; | ||
if (arrayType === 'simple') return; | ||
|
||
// arrayFirstKey may be either another key, or null | ||
if (arrayFirstKey === null || arrayFirstKey === key) array.push(scope = {}); | ||
arrayFirstKey = arrayFirstKey || key; | ||
} | ||
|
||
bufferKey = key; | ||
bufferString = restOfLine; | ||
|
||
flushBufferInto(key, {replace: true}); | ||
} | ||
|
||
function parseArrayElement(value) { | ||
flushBuffer(); | ||
|
||
arrayType = arrayType || 'simple'; | ||
|
||
array.push(''); | ||
bufferKey = array; | ||
bufferString = value; | ||
flushBufferInto(array, {replace: true}); | ||
} | ||
|
||
function parseCommandKey(command) { | ||
// if isSkipping, don't parse any command unless :endskip | ||
|
||
if (isSkipping && !(command === "endskip" || command === "ignore")) return flushBuffer(); | ||
|
||
switch (command) { | ||
case "end": | ||
// When we get to an end key, save whatever was in the buffer to the last | ||
// active key. | ||
if (bufferKey) flushBufferInto(bufferKey, {replace: false}); | ||
return; | ||
|
||
case "ignore": | ||
// When ":ignore" is reached, stop parsing immediately | ||
input = ''; | ||
break; | ||
|
||
case "skip": | ||
isSkipping = true; | ||
break; | ||
|
||
case "endskip": | ||
isSkipping = false; | ||
break; | ||
} | ||
|
||
flushBuffer(); | ||
} | ||
|
||
function parseScope(scopeType, scopeKey) { | ||
// Throughout the parsing, `scope` refers to one of the following: | ||
// * `data` | ||
// * an object - one level within `data` - when we're within a {scope} block | ||
// * an object at the end of an array - which is one level within `data` - | ||
// when we're within an [array] block. | ||
// | ||
// `scope` changes whenever a scope key is encountered. It also changes | ||
// within parseStartKey when we start a new object within an array. | ||
flushBuffer(); | ||
flushScope(); | ||
|
||
if (scopeKey == '') { | ||
// Reset scope to global data object | ||
scope = data; | ||
|
||
} else if (scopeType === '[' || scopeType === '{') { | ||
// Drill down into the appropriate scope, in case the key uses | ||
// dot.notation. | ||
var keyScope = data; | ||
var keyBits = scopeKey.split('.'); | ||
for (var i=0; i<keyBits.length - 1; i++) { | ||
keyScope = keyScope[keyBits[i]] = keyScope[keyBits[i]] || {}; | ||
} | ||
|
||
if (scopeType == '[') { | ||
array = keyScope[keyBits[keyBits.length - 1]] = keyScope[keyBits[keyBits.length - 1]] || []; | ||
// If we're reopening this array, set the arrayType | ||
if (array.length > 0) arrayType = typeof array[0] === 'string' ? 'simple' : 'complex'; | ||
|
||
} else if (scopeType == '{') { | ||
scope = keyScope[keyBits[keyBits.length - 1]] = keyScope[keyBits[keyBits.length - 1]] || {}; | ||
} | ||
} | ||
} | ||
|
||
function formatValue(value, type) { | ||
value = value.replace(/(?:^\\)?\[[^\[\]\n\r]*\](?!\])/mg, ""); // remove comments | ||
value = value.replace(/\[\[([^\[\]\n\r]*)\]\]/g, "[$1]"); // [[]] => [] | ||
|
||
if (type == 'append') { | ||
// If we're appending to a multi-line string, escape special punctuation | ||
// by using a backslash at the beginning of any line. | ||
// Note we do not do this processing for the first line of any value. | ||
value = value.replace(new RegExp('^(\\s*)\\\\'), "$1"); | ||
} | ||
|
||
return value; | ||
} | ||
|
||
function flushBuffer() { | ||
var result = bufferString + ''; | ||
bufferString = ''; | ||
return result; | ||
} | ||
|
||
function flushBufferInto(key, options) { | ||
options = options || {}; | ||
var value = flushBuffer(); | ||
|
||
if (options.replace) { | ||
value = formatValue(value, 'replace').replace(new RegExp('^\\s*'), ''); | ||
bufferString = (new RegExp('\\s*$')).exec(value)[0]; | ||
} else { | ||
value = formatValue(value, 'append'); | ||
} | ||
|
||
if (typeof key === 'object') { | ||
// key is an array | ||
if (options.replace) key[key.length - 1] = ''; | ||
|
||
key[key.length - 1] += value.replace(new RegExp('\\s*$'), ''); | ||
|
||
} else { | ||
var keyBits = key.split('.'); | ||
bufferScope = scope; | ||
|
||
for (var i=0; i<keyBits.length - 1; i++) { | ||
if (typeof bufferScope[keyBits[i]] === 'string') bufferScope[keyBits[i]] = {}; | ||
bufferScope = bufferScope[keyBits[i]] = bufferScope[keyBits[i]] || {}; | ||
} | ||
|
||
if (options.replace) bufferScope[keyBits[keyBits.length - 1]] = ''; | ||
|
||
bufferScope[keyBits[keyBits.length - 1]] += value.replace(new RegExp('\\s*$'), ''); | ||
} | ||
} | ||
|
||
function flushScope() { | ||
array = null; | ||
arrayType = null; | ||
arrayFirstKey = null; | ||
} | ||
|
||
flushBuffer(); | ||
return data; | ||
} | ||
|
||
var root = this; | ||
var archieml = {load: load}; | ||
|
||
if (typeof exports !== 'undefined') { | ||
if (typeof module !== 'undefined' && module.exports) { | ||
exports = module.exports = archieml; | ||
} | ||
exports.archieml = archieml; | ||
} else { | ||
this.archieml = archieml; | ||
} | ||
|
||
if (typeof define === 'function' && define.amd) { | ||
define('archieml', [], function() { | ||
return archieml; | ||
}); | ||
} | ||
}.call(this)) | ||
|
Oops, something went wrong.