A Clojure library designed to crawl web sites.
- Tested and running on
openjdk 16.0.2 2021-07-20
- Using clojure
1.11.1
- You will need
Leiningen
You can invoke using leiningen directly, if you install lein accordingly
brew install leiningen
lein run <URL> <DEPTH>
I would, however, recommend piping to jq
lein run https://lucasob.github.io 1 | jq
Given I don't want to make anyone's day terrible, I've wrapped this up to be invoked from docker
docker build -t crawler .
(I'd still recommend piping to JQ, the depth param is optional, but does default to 0)
docker run crawler https://lucasob.github.io 1 | jq
At this point, I've run out of other ways to help 😉
The logs are a bit bad, and you won't get nice jq, but regardless.
You have to manually specify the url you want inside the compose file.
docker compose up crawler
- We rely on testcontainers to abstract away spinning up dependencies
- We use WireMock to be able to properly stub HTTP calls
lein test
lein deps
(This will download additional deps on first run)
lein cloverage
Yeah, the JVM has to start up every time you use lein 😭
This is optional, but enabling test container reuse means not spinning up a container for each test
Within ~/.testcontainers.properties
set testcontainers.reuse.enable=true
If I feel generous, I'd like to
- Wrap up CLI argument handling to be better & show help
- This would go so much faster in BB, but the issue is testing and reproducibility -- bb is hard.