Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4773 Allow iquery probes to trigger sweep tasks for existing Dockets #4953

Merged
merged 5 commits into from
Jan 24, 2025

Conversation

albertisfu
Copy link
Contributor

This PR resolves the issue described in #4773, where the last probe in a probe cycle encountered an existing docket, preventing the sweep task from being executed. Previously, one of the conditions for triggering a sweep was that the docket had to be newly created. While this condition is useful for triggering sweep tasks once we catch up with court content and enable IQUERY_SWEEP_UPLOADS_SIGNAL_ENABLED, we currently need the probe daemon to trigger sweep tasks even for existing dockets in the last probe.
To address this, I updated the conditions for triggering update_latest_case_id_and_schedule_iquery_sweep in the handle_update_latest_case_id_and_schedule_iquery_sweep signal.

  • I also updated a related test to confirm that this scenario is now working as expected.

Additionally, after tweaking the logic to allow sweep tasks for existing dockets, some related tests began failing specifically, tests ensuring the sweep task was called the expected number of times. I found the issue was caused by DocketFactory, which was triggering the save method twice: once to save the instance for the first time and again within the post_generation hook.

After discussing this with Eduardo, he explained to me that the hook is necessary to avoid calling S3 when generating DocketFactory instances with the filepath_local field so removing the hook was not an option. To resolve this, I replaced the save method in the hook with a queryset.update call, which persists the filepath_local field without triggering signals.

@albertisfu albertisfu requested a review from mlissner January 22, 2025 22:38
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you. Looks about right to me!

Copy link
Contributor

@ERosendo ERosendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@ERosendo
Copy link
Contributor

ERosendo commented Jan 23, 2025

note: I'm holding off on merging this PR until @albertisfu provides instructions for enabling the daemon and we create separate issue to prioritize those tasks.

@albertisfu
Copy link
Contributor Author

Thanks @ERosendo for the review!

@mlissner I think the PR can be safely merged since I remember you disabled the daemon. However, if you want to confirm, the setting should be IQUERY_CASE_PROBE_DAEMON_ENABLED=False.

Before getting the daemon running again, since in #4773 you mentioned that we should start the sweep from 2020, we should run the following command:

docker exec -it cl-django python /opt/courtlistener/manage.py ready_mix_cases_project --task set-case-ids --court-type all --date-filed 2020-01-01
This command will set the Redis keys for highest_known_pacer_case_id and pacer_case_id_current to the latest pacer_case_id found in the DB before 2020-01-01.

We can confirm the values set with this script:

from cl.lib.redis_utils import get_redis_interface

r = get_redis_interface("CACHE")
highest_known_pacer_case_id = r.hgetall("iquery:highest_known_pacer_case_id")
pacer_case_id_current = r.hgetall("iquery:pacer_case_id_current")

highest_known_pacer_case_id = {k: v for k, v in highest_known_pacer_case_id.items()}
pacer_case_id_current = {k: v for k, v in pacer_case_id_current.items()}

data = [
    {"court_id": court_id, "highest_known_pacer_case_id": highest_known_pacer_case_id[court_id], "pacer_case_id_current": pacer_case_id_current[court_id]}
    for court_id in highest_known_pacer_case_id.keys()
]
print(data)

After this is set, we should confirm that the probe-iquery-pages-daemon deploy is working and has been updated with the latest code. Finally, we can turn IQUERY_CASE_PROBE_DAEMON_ENABLED to True.

@mlissner mlissner merged commit c38ccdb into main Jan 24, 2025
15 checks passed
@mlissner mlissner deleted the 4773-update-pacer-case-ids-from-probe-daemon-progress branch January 24, 2025 01:22
@mlissner
Copy link
Member

I confirmed the setting is correct. It was disabled, but the variable name was wrong, so I fixed that. We have an issue for launching the probe here: https://github.com/freelawproject/infrastructure/issues/208

I just set it so that it's assigned to me, Alberto, and Diego. @chaco-fl, if you can take point on this one when you're ready, that would be great. I think Alberto should be able to help you understand anything you need, and I'm happy to help too, since I set up the daemon. Maybe a three-way call would be a good way to do this, when you're ready, Diego. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

iquery probe daemon was looping because existing content was found
3 participants