-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [WIP] initial add of apify-extra package with dataset functions #81
Conversation
…d not working yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packages/apify-extra/package.json
Outdated
"@types/bluebird": "^3.5.37", | ||
"bluebird": "^3.7.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to use bluebird 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will eventually get rid of it but it will require a bit of work
packages/apify-extra/src/dataset.ts
Outdated
} | ||
|
||
// Now we execute all the requests in parallel (with defined concurrency) | ||
await bluebird.map(requestInfoArr, async (requestInfoObj) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be the same as:
await bluebird.map(requestInfoArr, async (requestInfoObj) => { | |
await Promise.all(requestInfoArr.map(async (requestInfoObj) => { |
and we can let bluebird rest in its graveyard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it is there for the concurrency option. I can replace it later with AutoscaledPool, BasicCrawler or maybe I will find something more lightweight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but pls test :) btw: this is also a good to export as we used this bluebird thing in more places
Ok, after partially revamping this yesterday, I have a few points/questions (@metalwarrior665 ):
If I can add my two cents, I like the "parallel" forEach (map?) approach - it reminds me of Hadoop MapReduce and similar Big Data processing systems - and imo works well with an environment with migrations; for the rest, I am ready to have my mind changed :) |
@barjin keep in mind the main issue you should be looking into is the build failure, we can discuss code improvements once we can actually build it. |
We could get rid of a) for simplicity and make |
My plan going forward is to have one more look at the features and the logic (suggestions welcome) and then we would release it in beta so we can start using it internally. Then add proper tests (it is quite battle-tested in production :D but for sure needs tests) before we release it in latest. Sounds good? |
Maybe the way to do this better would be to extend the SDK classes so it is more in line with SDK interfaces rather than having standalone functions with tons of options. But I probably don't have capacity for a major rewrite now :) |
closed in favor of #116, all further development there |
Build not working yet