-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publisher Coverage Script and Badge #324
Conversation
@dobbersc Thanks for adding this, that's exactly what we needed. |
for publisher in sorted( | ||
publisher_region, key=lambda p: cast(PublisherEnum, p).name # type: ignore[no-any-return] | ||
): | ||
publisher_name: str = publisher.name # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole construct seems unnecessary to me, also my local mypy installation runs with no error when removing publisher_name
and use publisher.name
again, as you did before 6f4f114.
I was running mypy 1.1.1|1.6.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with you. On my local machine, this part is also not necessary. Although, for some reason, the CI fails without this change (CI Log). Just to be sure I'll try it again without. The enum typing is unfortunately a bit messy :/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very weird. I'm running the same version of mypy on a complete fresh conda installation of fundus (cleared cache etc.) and get no complaints locally from mypy while the CI's mypy fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing this. That is indeed very weird.
The enum typing is unfortunately a bit messy :/
I agree. I will add something about this to #325
Do you have any clue what's causing this discrepancy between CI and local runs? My next suggestion would be to remove the mypy
version constraints from pyproject.toml
and see if this is a version-specific problem or some problem with the CI in general. What do you think?
This PR adds a script to execute a baseline real-time crawler validation. A problem with the current crawler unit tests is that the crawlers are only validated on pre-fetched articles. This script aims to check a publisher's availability and basic crawl results.
The tests include a real-time crawl for each publisher's news map and RSS Feed checking the received articles for attribute completeness. Note that this script does not check the attributes' correctness, only their presence. The resulting publisher coverage is then summarised in the output report.
To automate the publisher coverage script this PR includes an action that runs every day at 01:00 generating a badge containing the publisher coverage displayed in the README. See here for a current action run.
To enhance the functionality we could also run the tests for each source independently, e.g. news map and RSS feed if specified, such that a crawler only passes the tests if all specified sources return appropriate articles.