feat: Actor.charge() #346

janbuchar · 2024-12-23T20:17:26Z

Depends on fix: Add ActorRunPricingInfo type apify-client-js#623
Draws inspiration from https://github.com/metalwarrior665/actor-charge-manager-poc

TODO

write e2e tests

metalwarrior665

Thanks! I don't have any major complain, just a few notes.

packages/apify/src/actor.ts

packages/apify/src/internals/charging.ts

metalwarrior665 · 2025-01-09T21:03:32Z

packages/apify/src/internals/charging.ts

+                eventName,
+                eventTitle: pricingInfo.title,
+                eventPriceUsd: pricingInfo.price,
+                timestamp,


I think allowing the dev to pass in custom data connected to the event (like url) is useful for both debugging and user validation. Do you see any issue with that?

It's not part of the whitepaper

cc @mhamas This is not blocking this PR but something to think about.

As written in another comment, I have concerns about creating a special dataset just for logs (because of size, charging, retention, permissions). We actually discussed this with @mtrunkat and @fnesveda and we decided against that, and instead we store the system log elsewhere (although I was actually in pro-dataset camp :D, I think it was mostly @fnesveda against and Mara decided at the end). But if you've learnt by the practice that a log dataset is always necessary, maybe it should be an integral part of the platform itself? Let's discuss offline @metalwarrior665 ?

@janbuchar @B4nan @netmilk Ok, we had a chat with @mhamas and there are several serious concerns with the debug dataset that didn't cross my mind previously.

Consider there can be use-cases where developer would charge per LLM token so each charge would be in thousands like Actor.charge('llm-token', 4562). This would create a huge pressure on the dataset, namely
a) Pushing thousands takes some time
b) push items cost for developer
c) timed storage cost for user

I didn't fully realize that after stripping the developer provided metadata for each event, the chargingLogDataset now only contains a timestamp field that gives any extra information over just the plain chargingState (which is live visible on the platform). If the developer cannot put things like url there, then it is not very useful for both the user and the dev (except for local testing).

So we agreed (feel free to put your arguments in) that we should remove the current chargingLogDataset functionality before we figure out how to do it in a more bulletproof way. It shouldn't be any breaking adding it in later and it is not really required to make it work, devs can always to it themselves.

a) There are some ideas in the platform team about making this as platform feature that would not require storing it on user's account. Moving this forward depends of the feedback from both creators and users.
b) We might want to keep this functionality (ideally with the dev-provided metadata) for devs only to help with building the Actor. Now the question is how to do that in a way that doesn't escape to production.

In any way, I think we can ship this without the dataset for now so we have it out and think about the solution together. But happy to hear other arguments

After some consideration, I'm in favor of making this opt-in on the code level, probably with some experimental flag. There are hundreds of other ways a Creator may waste money of their users, so I wouldn't stress over this to the point of removing this functionality entirely.

If the platform team is interested in supporting this on a higher level, all they would have to do would be to provide a charging dataset ID via environment variables.

Also, did you consider not making a separate dataset row for each occurence of an event when somebody does Actor.charge('asdf', 33333)? It doesn't fix everything, but it's something.

If the platform team is interested in supporting this on a higher level, all they would have to do would be to provide a charging dataset ID via environment variables.

If we were to do this in the future, we would do it automatically in the API, it'd be on for all PPE runs and hence no special dataset ID would be passed to the Actor. Actor wouldn't be even aware that this is happening behind the scenes.

Authoritative as it sounds, I don't think that's the only viable solution. Bear in mind that this would not help at all with local develoment, and we'd probably have to keep the current implementation of the charging log dataset anyways...

packages/apify/src/internals/charging.ts

packages/apify/src/actor.ts

test/e2e/runSdkTests.mjs

B4nan · 2025-01-23T14:50:55Z

Huh, what else can opt-in mean?

I mean a flag in code, Actor developer would have to enable it. I can see it would be useful for local development.

Surely they can make it part of input schema too.

janbuchar · 2025-01-23T15:20:49Z

It can mean "End user can toggle it" and "Actor developer can toggle it", and those have different implications. But on the SDK level, you can't really distinguish those that well - the developer can always enable it for all end users (and get them billed for a big dataset, not sure how much money that actually means).

B4nan · 2025-01-23T17:10:37Z

Oh I see, since we are talking SDK support I wasn't thinking about anyone else than a developer of the Actor. In that case, we could print a warning when this is enabled, saying this shouldn't be used in production since it can cause additional price bumps and/or performance issues.

mhamas · 2025-01-23T17:16:53Z

In that case, we could print a warning when this is enabled, saying this shouldn't be used in production since it can cause additional price bumps and/or performance issues.

We can print a warning, or we can just remove it to be super sure somebody doesn't check it in by mistake :-). There is currently really no reason why this should be ever enabled on production for end users.

If you really want to keep it in as general opt-in, is it possible to know in SDK "I'm running in production and I'm a public Actor" or "I'm running in production and I'm not being run by the owner of the Actor" and then throw from the SDK at that point to stop the Actor?

B4nan · 2025-01-23T17:35:02Z

We can print a warning, or we can just remove it to be super sure somebody doesn't check it in by mistake :-).

You keep talking about a checkbox, I am talking about an option in Actor config, this is SDK, we are not creating any input schema here. So an explicit line in code doing something like Configuration.set('enableChargingDataset', true).

There is currently really no reason why this should be ever enabled on production for end users.

That's quite a weird reason to remove some functionality, you don't just run code in production, you need to first develop it. Wasn't this always meant for development?

If you don't see a value during development, that would be another story.

metalwarrior665 · 2025-01-23T19:18:40Z

Btw recently we added periodic dump of chargingState to KV store locally. That feels like more conscise solution than bunch of individual dataset items files if it is only for local dev and we are not adding any other context data for each event.

…r instead

metalwarrior665 · 2025-01-30T09:29:02Z

packages/apify/src/internals/charging.ts

@@ -253,15 +265,17 @@ export class ChargingManager {
            throw new Error('ChargingManager is not initialized');
        }

-        if (!this.pricingInfo[eventName]) {
+        const price = this.isAtHome ? this.pricingInfo[eventName].price : 1; // Use a nonzero price for local development so that the maximum budget can be reached


This feels weird and a step down from our reference implementation. We allowed passing in run ID so you can use real pricing for that project (I guess that would still work if we pass both ACTOR_RUN_ID and IS_AT_HOME), here we hardcode it at $1. I guess you would be ok to pass in the pricing somehow but we just didn't figure how?

I really didn't like the idea that you first need to publish the actor to figure things out, we need to do better. What about having default pricing info in the configuration too. Would be used locally only, and we could figure out a way to set it up on platform if its not there (and override it with whatever is set up on the platform otherwise).

It's a hard pass on copying the pricing from a platform run from me as well - it's just too involved and convoluted. But I can imagine that any alternative I'd implement would end up in endless bikeshedding, so I'd prefer to resolve this in a separate PR.

Ok, lets resolve later, not a huge deal and there is workaround. Config file is good but Im afraid of support questions why it didnt propagate to platform.

We could have an info log when we use the platform pricing and config has one. It would be right at the top, so quite easy to see.

mhamas

The PPE related matters look good to me, left some final thoughts on the log dataset.

mhamas · 2025-01-30T10:01:17Z

packages/apify/src/internals/charging.ts

+            if (this.purgeChargingLogDataset) {
+                const dataset = await Dataset.open(this.LOCAL_CHARGING_LOG_DATASET_NAME);
+                await dataset.drop();
+            }
+
+            this.chargingLogDataset = await Dataset.open(this.LOCAL_CHARGING_LOG_DATASET_NAME);


I'll just state one more time that I think that we really want to make sure that the log dataset will never be used in production for Actors deployed to the store. Only for the testing by the developer. There is currently 0 value to use it for the end users, it'll only bring them harm, confusion, and potentially huge costs.

I'd be more conservative here and don't even allow the creation of the dataset in the cloud, only locally. I'll not block this PR on it, it's ultimately up to you and was already discussed a lot, but please let's make sure the developers are really aware of this, and let's minimize the risk of this them getting this wrong.

Also, if anytime in the future, we have a reason to start using this dataset for end users, let's discuss this internally as that functionality would then probably belong to the platform itself, not SDK. Thanks!

I'll just state one more time that I think that we really want to make sure that the log dataset will never be used in production for Actors deployed to the store. Only for the testing by the developer. There is currently 0 value to use it for the end users, it'll only bring them harm, confusion, and potentially huge costs.

I'd be more conservative here and don't even allow the creation of the dataset in the cloud, only locally. I'll not block this PR on it, it's ultimately up to you and was already discussed a lot, but please let's make sure the developers are really aware of this, and let's minimize the risk of this them getting this wrong.

Also, if anytime in the future, we have a reason to start using this dataset for end users, let's discuss this internally as that functionality would then probably belong to the platform itself, not SDK. Thanks!

I think this is the METADATA_DATASET in @metalwarrior665 repository. If so, that specific implementation did not provide any value to our tests except for confusion. It doesn't get purged properly between runs (locally), and we do not utilize it. The end users will see thousands of this dataset and they will ask the reason for the customer service.

However, I can understand the main intention for this specific implementation. I believe users would like to see a detailed cost report and see who paid what with which amount. If this can be shown differently in the platform, we can only use this on the development as far as it gets purged.

Sure, I can disable it in the cloud. This implementation should support purging locally.

mhamas · 2025-01-31T07:45:35Z

packages/apify/src/internals/charging.ts

+        const timestamp = new Date().toISOString();
+
+        if (this.chargingLogDataset !== undefined) {
+            for (let i = 0; i < chargedCount; i++) {


Suggestion: why not push one item for the whole chargedCount? If the developer wants to simultaneously charge N things, I would expect a single item would suffice for debugging. Charging line by line will potentially create a lot of data with no real value added. Also, doing it like this rapidly increases the potential cost incurred by using this (by mistake) in production.

This is taken from the PoC by @metalwarrior665, but I agree with your suggestion. It's small enough of a change to add that.

The idea was that there will be mapping of items and events and if you push a list, each event might have different metadata. But we diverged from the idea anyway so feel free to change

packages/apify/src/actor.ts

vladfrangu · 2025-01-31T10:45:45Z

package.json

+        "test:e2e": "npm run test:e2e:scrapers && npm run test:e2e:sdk",
+        "test:e2e:scrapers": "node test/e2e/runScraperTests.mjs",
+        "test:e2e:sdk": "npm run test:e2e:sdk:tarball && node test/e2e/runSdkTests.mjs",
+        "test:e2e:sdk:tarball": "npm run build && cd packages/apify && mv $(npm pack | tail -n 1) ../../test/e2e/apify.tgz",


not very portable but c'est la vie

If it works in CI and on my laptop, it's good enough for me.

vladfrangu · 2025-01-31T10:48:32Z

packages/apify/src/internals/charging.ts

+ */
+export class ChargingManager {
+    readonly LOCAL_CHARGING_LOG_DATASET_NAME = 'charging_log';
+    readonly PLATFORM_CHARGING_LOG_DATASET_ID_KEY = 'CHARGING_LOG_DATASET_ID';


This feels like it should come from apify/consts

Yeah, I admit I have deep disdain for that package. But I honestly believe that it is justified not to use it here since this is all very adhoc and may get removed. Might be a reason to make these private, huh...

janbuchar added 7 commits December 23, 2024 18:48

Add signatures of new methods

7529857

Add ACTOR_MAX_TOTAL_CHARGE_USD configuration option

7f786f8

Fix type error

32982bb

Update method signatures to be more in line with Actor whitepaper

1897d08

Update apify-client

6461f01

Partially implement ChargingManager

0d558b6

Use ChargingManager in Actor

044ae6c

janbuchar added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Dec 23, 2024

github-actions bot assigned janbuchar Dec 23, 2024

github-actions bot added this to the 105th sprint - Tooling team milestone Dec 23, 2024

janbuchar added 7 commits January 6, 2025 19:09

Read pricing info and use it

5d2a678

Dataset set up

38ff1b6

Load more charging information on platform

582bb0d

Make sure that we stay within the budget when charging

fe38c87

Reorder stuff

b3dafeb

Fill in docblocks

c58aae6

Reorder operations to prevent race conditions

8dc3422

metalwarrior665 reviewed Jan 9, 2025

View reviewed changes

janbuchar added 3 commits January 10, 2025 12:58

Update apify-client

02f1953

Update e2e test directory structure

35a378f

WIP: Add e2e sdk test setup

e9482fc

github-actions bot added the tested Temporary label used only programatically for some analytics. label Jan 10, 2025

janbuchar added 2 commits January 14, 2025 16:04

Finalize sdk e2e testing environment

9925108

Initial e2e test of Actor.charge

973b19f

metalwarrior665 reviewed Jan 15, 2025

View reviewed changes

packages/apify/src/actor.ts Outdated Show resolved Hide resolved

B4nan mentioned this pull request Jan 15, 2025

Support for actor charge in the CLI apify/apify-cli#728

Open

janbuchar added 2 commits January 15, 2025 17:17

Improve test runner

72ff36a

Make Actor.charge test fail

b5ebe79

B4nan reviewed Jan 15, 2025

View reviewed changes

test/e2e/runSdkTests.mjs Outdated Show resolved Hide resolved

Use an existing type for pricingModel

ec72e6f

janbuchar added 8 commits January 27, 2025 16:28

Make charging log dataset opt-in

346cc98

Address code review comments

31ad976

Address more review comments

01832c8

Hide global Actor methods other than charge and add getChargingManage…

eee72cb

…r instead

Repeat ourselves to please our documentation tool

c16d554

Remove unused import

04c4790

Update e2e tests

915cb31

Fix local charging

a3e78f2

janbuchar requested review from mhamas, B4nan and metalwarrior665 January 30, 2025 09:13

metalwarrior665 reviewed Jan 30, 2025

View reviewed changes

Lint

b080ae6

mhamas approved these changes Jan 31, 2025

View reviewed changes

metalwarrior665 approved these changes Jan 31, 2025

View reviewed changes

B4nan approved these changes Jan 31, 2025

View reviewed changes

packages/apify/src/actor.ts Show resolved Hide resolved

janbuchar added 3 commits January 31, 2025 10:44

Do not allow using the charging log dataset on platform

e2ff873

Ignore a docblock

4839a74

Log a single line for each Actor.charge

292ab3e

vladfrangu approved these changes Jan 31, 2025

View reviewed changes

Make consts private

30ac39a

janbuchar merged commit e26e496 into master Jan 31, 2025
8 checks passed

janbuchar deleted the actor-charge branch January 31, 2025 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Actor.charge() #346

feat: Actor.charge() #346

janbuchar commented Dec 23, 2024 •

edited

Loading

metalwarrior665 left a comment

metalwarrior665 Jan 9, 2025

janbuchar Jan 10, 2025

metalwarrior665 Jan 16, 2025

mhamas Jan 23, 2025

metalwarrior665 Jan 23, 2025 •

edited

Loading

janbuchar Jan 23, 2025

mhamas Jan 30, 2025

janbuchar Jan 30, 2025

B4nan commented Jan 23, 2025 •

edited

Loading

janbuchar commented Jan 23, 2025 •

edited

Loading

B4nan commented Jan 23, 2025 •

edited

Loading

mhamas commented Jan 23, 2025 •

edited

Loading

B4nan commented Jan 23, 2025

metalwarrior665 commented Jan 23, 2025

metalwarrior665 Jan 30, 2025

B4nan Jan 30, 2025

janbuchar Jan 30, 2025

metalwarrior665 Jan 30, 2025

B4nan Jan 30, 2025

mhamas left a comment

mhamas Jan 30, 2025

tugkan Jan 31, 2025

janbuchar Jan 31, 2025

mhamas Jan 31, 2025

janbuchar Jan 31, 2025

metalwarrior665 Jan 31, 2025

vladfrangu Jan 31, 2025

janbuchar Jan 31, 2025

vladfrangu Jan 31, 2025

janbuchar Jan 31, 2025

feat: Actor.charge() #346

feat: Actor.charge() #346

Conversation

janbuchar commented Dec 23, 2024 • edited Loading

TODO

metalwarrior665 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metalwarrior665 Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B4nan commented Jan 23, 2025 • edited Loading

janbuchar commented Jan 23, 2025 • edited Loading

B4nan commented Jan 23, 2025 • edited Loading

mhamas commented Jan 23, 2025 • edited Loading

B4nan commented Jan 23, 2025

metalwarrior665 commented Jan 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhamas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janbuchar commented Dec 23, 2024 •

edited

Loading

metalwarrior665 Jan 23, 2025 •

edited

Loading

B4nan commented Jan 23, 2025 •

edited

Loading

janbuchar commented Jan 23, 2025 •

edited

Loading

B4nan commented Jan 23, 2025 •

edited

Loading

mhamas commented Jan 23, 2025 •

edited

Loading