v4.7.0
Breaking Changes
Robots
- Replaces homebrew robotstxt code with
crawler-commons
Normalization
- Replaces homebrew URL normalization with
crawler-commons
You now need to pass a BasicURLNormalizer
into the PageFetcher
and the CrawlController
, e.g.
BasicURLNormalizer normalizer = BasicURLNormalizer.newBuilder().idnNormalization(BasicURLNormalizer.IdnNormalization.NONE).build();
Please note, that this BasicURLNormalizer
can support IdnNormalization
.
Dependency Upgrades
- Updates Tika to 2.1.0 (check/update your excludes, if you are importing crawler4j into your own code-base)
- Updates Jackson to 2.13.0 (test scope only)
- Updates PostgreSQL driver to 42.3.0 (examples only)
- Updates Flyway to 8.0.1 (examples only)
- Updates Guava to 31.0.1-jre
- Updates Groovy to 3.0.9 (test only)
Additional Notes
Full Changelog: v4.6.0...v4.7.0