Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdr-analytics] Lite data module for serving pdr metrics #284

Closed
2 tasks done
idiom-bytes opened this issue Oct 25, 2023 · 9 comments
Closed
2 tasks done

[pdr-analytics] Lite data module for serving pdr metrics #284

idiom-bytes opened this issue Oct 25, 2023 · 9 comments
Assignees
Labels
Type: Enhancement New feature or request

Comments

@idiom-bytes
Copy link
Member

idiom-bytes commented Oct 25, 2023

Problem

We need a lite service that integrate across the stack in a variety of ways. As defined in pdr-web oceanprotocol/pdr-analytics#1. This is meant to be a first cut at issue #(6) as outlined by trent here and further summarized here

DoD:

@trentmc
Copy link
Member

trentmc commented Oct 26, 2023

I like the sentiment here.

But the scope is too large. "For all metrics".

The reason that I gave five steps before this, is that each of those is small and evolutionary, in line with what we are organically doing. They let us explore metrics quickly, locally, without rest APIs and other heavier architectures to slow us down.

Whereas this issue reads like "build a big pipeline". Before we know any metrics.

So, please do NOT do this in its current description.

My recommendation:

Build a small thing that ONLY handles the "calculate accuracy over 2000 samples". That's all. No more.

Then we can see what it looks like, and get used to it. Meanwhile we can keep iterating on earlier steps 1/2/.. in a fast organic way.

@trentmc
Copy link
Member

trentmc commented Oct 26, 2023

I agree that we stop adding more to pdr-ws and instead focus on simpler python-centric data flows.

@idiom-bytes
Copy link
Member Author

idiom-bytes commented Oct 26, 2023

@trentmc I agree

Let's talk about last_2000 accuracy and organic.
Please refer to this PR as for "this existing inside pdr-backend" and "small"
https://github.com/oceanprotocol/pdr-backend/pull/290/files
=== Item A
(1) This script could just be extended to provide a more flexible interface. Or a similar one that, that may share common, structure, and output.
=== Item B
(2) Another module inside pdr-backend/pdr_backend/analytics could exist that serves this through a lite flask app
(3) Or. another repo could exists that imports pdr-backend and serves this through a lite flask app
=== Item C
(4) I don't see the benefits for (3) being a different repo. Analytics (as described in ticket oceanprotocol/pdr-analytics#1) is a backend item. Further, pdr-backend is a monorepo of backend services.
(5) This might help backend coordinate better, as @trizin just finished doing a bunch of python work that would help @kdetry

@trentmc
Copy link
Member

trentmc commented Oct 30, 2023

Let's talk about last_2000 accuracy and organic.
Please refer to this PR as for "this existing inside pdr-backend" and "small"
https://github.com/oceanprotocol/pdr-backend/pull/290/files
=== Item A
(1) This script could just be extended to provide a more flexible interface. Or a similar one that, that may share common, structure, and output.

Agreed. It needs refactoring though. I just created an issue for that: pdr-backend#296 "get_predictoor_info.py needs to be more modular, and tested".

=== Item B
(2) Another module inside pdr-backend/pdr_backend/analytics could exist that serves this through a lite flask app
(3) Or. another repo could exists that imports pdr-backend and serves this through a lite flask app
=== Item C
(4) I don't see the benefits for (3) being a different repo. Analytics (as described in ticket https://github.com/oceanprotocol/pdr-analytics/issues/1) is a backend item. Further, pdr-backend is a monorepo of backend services.
(5) This might help backend coordinate better, as @trizin just finished doing a bunch of python work that would help @kdetry

Let's keep it all in pdr-backend repo. Simpler, easier to manage, easier to test.

@trentmc
Copy link
Member

trentmc commented Oct 30, 2023

Agreed. It needs refactoring though. I just created an issue for that: pdr-backend#296 "get_predictoor_info.py needs to be more modular, and tested".

Therefore you have a choice:

  • flask for predictoor stats first. How: do issue 296, then this issue (wrap it with flask)
  • flask for 2000-samples first. How: do this issue (flask app) including 2000 samples

I recommend that latter. Fewer dependencies, and it'd be good to have this 2000-samples thing in. (It's embarrassing to see accuracies <50%)

@kdetry
Copy link
Contributor

kdetry commented Oct 30, 2023

Hello @trentmc , it is almost done, I will share the code asap. Thank you for the description

@kdetry
Copy link
Contributor

kdetry commented Oct 30, 2023

We may need different things in future, eg. currently the get_predictor_info.py script queries based on prediction senders, but I changed the query and the flask app works with contract addresses. I copied the script and worked on it, instead of importing it, I hope it won't be an issue

@trentmc
Copy link
Member

trentmc commented Oct 30, 2023

I copied the script and worked on it, instead of importing it, I hope it won't be an issue

Yes, it's an issue. You should never repeat code, it greatly adds complexity. "DRY = Don't Repeat Yourself" is the # 1 rule of managing complexity.

Given your proposed direction: I recommend doing #296 first, to refactor the script into reusable tested components. Otherwise we end up with unmanageable code.

@kdetry
Copy link
Contributor

kdetry commented Nov 13, 2023

It is merged and deployed, so we can close the issue

@kdetry kdetry closed this as completed Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants