You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We instal·led las openwayback version, reindexed all crawled content using CDX and start to search.
Reviewing results table after quering for an URLsome of the results has more than one entry for a date when there's only one crawl done using Heritrix, why?
Some times more than one date has an *,I was looking for * meaning but I can't found information.
The text was updated successfully, but these errors were encountered:
One possible reason is multiple URLs with slight variants (e.g www vs no-www or http vs https or uppercase vs lowecase) are grouped due to URL canonicalization. Also not impossible Heritrix really did collect the same URL multiple times (check the crawl log).
The * means the content of the page changed on this date as determined by comparing its sha1 digest with the previous snapshot.
We instal·led las openwayback version, reindexed all crawled content using CDX and start to search.
Reviewing results table after quering for an URLsome of the results has more than one entry for a date when there's only one crawl done using Heritrix, why?
Some times more than one date has an *,I was looking for * meaning but I can't found information.
The text was updated successfully, but these errors were encountered: