web-crawler

A Clojure library designed to crawl web sites.

Replication

Tested and running on openjdk 16.0.2 2021-07-20
Using clojure 1.11.1
You will need Leiningen

Running Directly

You can invoke using leiningen directly, if you install lein accordingly

brew install leiningen

lein run <URL> <DEPTH>

I would, however, recommend piping to jq

lein run https://lucasob.github.io 1 | jq

Docker

Given I don't want to make anyone's day terrible, I've wrapped this up to be invoked from docker

Build

docker build -t crawler .

Run

(I'd still recommend piping to JQ, the depth param is optional, but does default to 0)

docker run crawler https://lucasob.github.io 1 | jq

Compose

At this point, I've run out of other ways to help 😉

The logs are a bit bad, and you won't get nice jq, but regardless.

You have to manually specify the url you want inside the compose file.

docker compose up crawler

Testing

We rely on testcontainers to abstract away spinning up dependencies
We use WireMock to be able to properly stub HTTP calls

Commands

Run Tests

lein test

Pull Dependencies

lein deps

View coverage

(This will download additional deps on first run)

lein cloverage

Useful Bits

Slow?

Yeah, the JVM has to start up every time you use lein 😭

Enable test container reuse

This is optional, but enabling test container reuse means not spinning up a container for each test

Within ~/.testcontainers.properties set testcontainers.reuse.enable=true

Extension(s)

If I feel generous, I'd like to

Wrap up CLI argument handling to be better & show help
This would go so much faster in BB, but the issue is testing and reproducibility -- bb is hard.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src/web_crawler		src/web_crawler
test/web_crawler		test/web_crawler
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-crawler

Replication

Running Directly

Docker

Build

Run

Compose

Testing

Commands

Run Tests

Pull Dependencies

View coverage

Useful Bits

Slow?

Enable test container reuse

Extension(s)

About

Releases

Packages

Languages

lucasob/web-crawler

Folders and files

Latest commit

History

Repository files navigation

web-crawler

Replication

Running Directly

Docker

Build

Run

Compose

Testing

Commands

Run Tests

Pull Dependencies

View coverage

Useful Bits

Slow?

Enable test container reuse

Extension(s)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages