Continue where you left off feature is not working robustly #5

AriaShishegaran · 2023-11-30T14:32:26Z

I believe there's a problem when crawling massive documentation portals in a scenario where the crawler cannot handle a certain condition and exists suddenly, leading to the continuation task to start from the beginning but not intelligently, therefore leading to all the previous pages being checked before moving forward with the new download. This is my constant experience with stripe's documentation portal where every time after around an hour the crawler fails to move forward and there are no option to move forward exactly where you left off.
my suggestion is to switch the default behavior to continue where you left off without previous check, since I assume what you're doing with the current implantation is to also detect if there any changes and apply them as well on the existing files instead of literally sticking to the current state of the progress and attempting to finish the crawl job which is taking massive times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue where you left off feature is not working robustly #5

Continue where you left off feature is not working robustly #5

AriaShishegaran commented Nov 30, 2023

Continue where you left off feature is not working robustly #5

Continue where you left off feature is not working robustly #5

Comments

AriaShishegaran commented Nov 30, 2023