You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe there's a problem when crawling massive documentation portals in a scenario where the crawler cannot handle a certain condition and exists suddenly, leading to the continuation task to start from the beginning but not intelligently, therefore leading to all the previous pages being checked before moving forward with the new download. This is my constant experience with stripe's documentation portal where every time after around an hour the crawler fails to move forward and there are no option to move forward exactly where you left off.
my suggestion is to switch the default behavior to continue where you left off without previous check, since I assume what you're doing with the current implantation is to also detect if there any changes and apply them as well on the existing files instead of literally sticking to the current state of the progress and attempting to finish the crawl job which is taking massive times.
The text was updated successfully, but these errors were encountered:
I believe there's a problem when crawling massive documentation portals in a scenario where the crawler cannot handle a certain condition and exists suddenly, leading to the continuation task to start from the beginning but not intelligently, therefore leading to all the previous pages being checked before moving forward with the new download. This is my constant experience with stripe's documentation portal where every time after around an hour the crawler fails to move forward and there are no option to move forward exactly where you left off.
my suggestion is to switch the default behavior to continue where you left off without previous check, since I assume what you're doing with the current implantation is to also detect if there any changes and apply them as well on the existing files instead of literally sticking to the current state of the progress and attempting to finish the crawl job which is taking massive times.
The text was updated successfully, but these errors were encountered: