-
Notifications
You must be signed in to change notification settings - Fork 151
Home
Enlive is an extraction and transformation library for HTML and XML documents written in Clojure. It uses CSS-like selectors.
Usual Enlive applications include templating and screenscraping.
The Enlive approach to templating is functional and decouples design and presentation logic.
Each template or template part (snippet) is a plain function thus you can easily compose templates. There is a kind of inversion of control here. In most mainstream templating systems, templates drive the presentation logic. Here the presentation logic drives templates.
Templates are backed by source files which are plain HTML (no special tags or attributes, no code). This allows for easy round-tripping with designers or easy theming of your app.
Namespace declaration, import and dependencies:
(ns screenscraping (:use net.cgrand.enlive-html) (:import java.net.URL))
Retrieve the url of the latest Penny Arcade
(-> "http://www.penny-arcade.com/comic/" URL. html-resource (select [:body :img]) first :attrs :src)
Google group or mail me if you can’t publicly discuss your issues.
If you use Leiningen, add [enlive "1.1.5"]
to your project.clj
dependencies. (This won’t work with Clojure 1.0.)
If you use Clojure 1.0 (or 1.1 without Leiningen), git clone this repository (or use github’s download feature) and add the src
directory and lib/tagsoup-1.2.jar
to your classpath. Enlive does not require to be compiled.
Selectors are at the core of Enlive, the file syntax.html
at the root of the repository is a comprehensive syntax reference that can also be browsed online.
Enlive selectors are simply CSS selectors written in Clojure. A selector is always surrounded by square brackets: CSS div
is written [:div]
. span.bar a#foo
becomes [:span.bar :a#foo]
.
A trickier to translate CSS selector is a[href]
which is '[[:a (attr? :href)]]
with Enlive. No it’s not a typo, there are two pairs of square brackets. The outer one is the mandatory one (see above paragraph) and the inner one denotes intersection (aka and).
At this point, you should understand that [:.foo [:a (attr? :href)] :em]
is CSS’s .foo a[href] em
.
html-resource
and xml-resource
are helper functions to build a tree suitable for processing with Enlive. They take one only arg which can be: a string (denoting a resource on the classpath), a java.io.File, a java.io.Reader, a java.io.InputStream, a java.net.URI or a java.net.URL, a collection of nodes, a single nodes. (Nodes are maps.)
html-resource
uses Tagsoup to parse the resource, xml-resource
uses the default SAX parser.
Note that both html-resource
and xml-resource
return their argument when it is already a collection of nodes.
deftemplate
, defsnippet
, template
and snippet
implicitly wrap their source argument in a html-resource
call. This means that if you specify your resources as strings they will be searched on the classpath.
No support for namespaces.