-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wikidata <> authors integration: first steps #8236
Conversation
It would be really neat if this was added to the autocomplete, which currently pulls in the author's name, date-of-birth/death, genres and top work. On the Wikidata side this is mostly designed to be used to differentiate between two identically named items, so it's an ideal use case. It might also be worth looking at the Wikidata-powered infoboxes on Wikipedia to see what kind of info they surface. https://en.wikipedia.org/wiki/Template:Infobox_writer/Wikidata |
Is there an issue associated with this PR? Making better use of Wikidata data is definitely a good idea, but I'm not sure description is the best place to start. In addition to the I18N issues, these are also mostly machine generated from templates so you're going to end up with lots of " author (birth-death)" which doesn't really add a lot of value. Also, given the volumes of data involved, it's probably more appropriate to use the data dumps than be hitting their API. |
What is the usecase behind this? Is there to be a bulk QID fetching UI or something? |
@hornc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! A few small things, but I think the main things are:
The split between the two files is a little confusing, especially since we have WikidataEntities
, WikidataEntity
, WikidataRow
and the distinction between these is a little confusing. I think we want:
@dataclass
class WikidataEntity:
id: str
data: dict
updated: datetime
# In general we prefer function to have verb names
def get_description(self, locale: str) -> str | None
def get_entity(id: str, cache_only = True) -> WikidataEntity | None
def _get_from_web(id: str) -> WikidataEntity
def _get_from_cache(id: str) -> WikidataEntity | None
def _add_to_cache(id: str) -> None
That will I think keep all the code easily in one place, and since I doubt our wikidata caching will ever really grow to be more complicated than these, I think it'll keep things a touch tidier!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the css for this in a new less in static/css/wikidatabox.less
.
Then import that CSS from page-user.less
.
(I'm sorry, this flow is not great 😅 )
openlibrary/plugins/wikidata/code.py
Outdated
ttl (time to live) inspired by the cachetools api https://cachetools.readthedocs.io/en/latest/#cachetools.TTLCache | ||
""" | ||
entity = WikidataEntities.get_by_id(QID) | ||
if entity and seconds_since(entity.updated) < ttl: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably inline this function, and the timedelta
helpers might be a little clearer here!
if entity and seconds_since(entity.updated) < ttl: | |
if entity and (datetime.now() - entity.updated) < timedelta(days=30): |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8236 +/- ##
=======================================
Coverage 15.96% 15.96%
=======================================
Files 89 89
Lines 4710 4710
Branches 821 821
=======================================
Hits 752 752
Misses 3449 3449
Partials 509 509 ☔ View full report in Codecov by Sentry. |
@cdrini I think I've addressed all your concerns and the code is looking a lot cleaner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Would be great if we can merge those two classes + one other comment.
Next step for me is to create this table on prod so we can deploy this.
Next step for the project is to begin working on bulk import. We can make tweaks to how/where the data is displayed at any point, but getting the bulk data import sorted is likely the next most impactful step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Logic looks good; a few style things. I created the wikidata
table and will toss this up on testing now since the logic is already 👍
<div class="section"> | ||
<h6 class="collapse black uppercase">TESTING ONLY WIKIDATA SECTION</h6> | ||
<p style="text-align: center;"> | ||
<!-- This is only below subject temporarily to avoid merge conflicts on testing --> | ||
$ wikidata = page.wikidata(fetch_missing=show_librarian_extras) | ||
$if wikidata: | ||
$wikidata.get_description(i18n.get_locale()) | ||
</p> | ||
</div> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaving this code here now for testing.openlibrary.org to be easier to verify this PR on.
I can either remove it before we merge this or there is already a commit to remove it in #9130
Closing in favor of #9130 |
CLOSED IN FAVOR OF #9130
Document with the goal of this project
v0 goals:
Next steps:
Technical
Trying to keep it as simple as possible.
The wikidata method is on the authors model because works/editions store wikidata IDs differently so we'll have to handle that when we get there. My current coals is just authors.
Testing
Add a wikidata ID to an author and then you'll start seeing this "short description" field showing up on the side.
Screenshot
Aug 30 demo video (slightly outdated), see below
wd_demo.mp4
September 2 screenshot (most recent)
Stakeholders
@cdrini