Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect (and ignore) duplicate prices #422

Open
raphodn opened this issue Sep 4, 2024 · 6 comments
Open

Detect (and ignore) duplicate prices #422

raphodn opened this issue Sep 4, 2024 · 6 comments

Comments

@raphodn
Copy link
Member

raphodn commented Sep 4, 2024

Story

I sometimes see "identical" prices. Reasons could be a mistake (scanning/adding the product twice), or adding a price from a receipt and then from another source like GDPR, API error...)

Identical price = same product + same location + same date
(not necessarily the same proof ; if 2 different proofs, which one to keep ?)

And should we allow 2 identical products on the same proof ? Only if the price is different (in which case would that happen ?) ?

Linked issues & comments

@raphodn
Copy link
Member Author

raphodn commented Oct 11, 2024

What happens when a duplicate price is sent to the API:

  • Should the server respond an error ? for instance a 409 "Conflict"
  • or a 200 but ignore the payload ?

Should the user be informed that the price was ignored by the server ?

@raphodn raphodn changed the title Detect duplicate prices Detect (and ignore) duplicate prices Oct 11, 2024
@raphodn
Copy link
Member Author

raphodn commented Oct 11, 2024

What about duplicate proofs ?
Much harder to detect, or we would need to store a hash, and check if a Proof with the same user already exists

Related issue : openfoodfacts/open-prices-frontend#534

edit : created a dedicated issue : #514

@monsieurtanuki
Copy link

For the moment in off-dart's createPrice we expect a status code of 201 ("Created").

We may return the "old" price and let the developer deal with it (checking field by field). If needed, the developer may then decide to delete the "old" price and add the new one.

Possibly, you may also return 409 as a "Conflict" status code.

Anyway, whatever you decide should probably take into account our low reactivity in Smoothie: coding can be fast, but the app roll out can take months.
Perhaps in 2 steps: the "return the old" now and the "409" in one year?
Or both possibilities in a different api version number.

@raphodn
Copy link
Member Author

raphodn commented Jan 6, 2025

With #611 we also have the case where proof images contain twice the same price tag.
For instance this proof: https://prices.openfoodfacts.org/proofs/14137

We could have a (weekly ?) script run and detect duplicate price_tags, and maybe change their status + delete the corresponding price.

Or just leave it as is 🤷 and accept that we'll have some duplicate prices (for good or bad reasons)

Examples :

@monsieurtanuki
Copy link

I don't think there's any added value in keeping 100% duplicates (user, location, proof, barcode, currency, date, value).
For your "Italian milk" example, a human being would probably have said "oh it's the same product, we need it only once". And so should your AI, btw.
A weekly script could indeed remove the duplicates.

Regarding duplicate errors, like same location, barcode, currency, date but different value that's different.
Probably to be listed, and potentially fixed by hand.

@monsieurtanuki
Copy link

Probably to be listed, and potentially fixed by hand.

...or Robotoff'ed ("Which is the correct price?")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

2 participants