-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2u/course optimizer #35887
base: master
Are you sure you want to change the base?
2u/course optimizer #35887
Conversation
json_content = json.loads(content) | ||
broken_links_dto = _create_dto(json_content, request.user) | ||
elif task_status.state in (UserTaskStatus.FAILED, UserTaskStatus.CANCELED): | ||
status = max(-(task_status.completed_steps + 1), -2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a max() function to compute status is highly suspicious! (Ditto the min function below). It seems like you're combining the number of completed steps with information as to whether the task failed or was canceled. Have you considered using independent variables or fields to capture these rather disparate concepts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is code I copied over from import_export.py
. I believe the api will return a negative number on a fail related to the step in the process.
I agree this looks bad... but I would want to update code in both places at the same time. Maybe in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR but part of the same task? General stewardship rule is "leave the code a little better than you found it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
below the line. Not important.
You can just go with whatever works for you.
I see people often just copying over code from older places since it just works, and thus creating new not-great code. I don't generally think that's a good idea. In my opinion we can extract code that's reused into helper functions if the code is good, but if not, I'd prefer writing new code that is better. It also means the code author needs to think and understand the code a little bit more in-depth than when they copy it.
If you want to change the code but it's too much trouble to extract the function from the other place, I'd say just change it in the new place and then just extract the parts that are still the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
Part of improving task statuses
|
||
log = logging.getLogger(__name__) | ||
|
||
STATUS_FILTERS = user_tasks_settings.USER_TASKS_STATUS_FILTERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A one line comment as to how these will be used would be helpful. The module filters user tasks on the basis of these, but I'm unfamiliar with this package's use. The one line saves me the trouble of having to go read up on the package if all I want is some notion of what it's used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above "# Tuple containing zero or more filters for UserTaskStatus listing REST API calls." does not help my understanding at all. I understand we apply filters. What do we apply them for?
I went through code quite a bit to figure that out, which was not straightforward.
The result:
This is an adjustable django setting. It specifies general filters that are applied to all tasks before they are returned to the user. Currently this setting is not specified in edx-app settings. So it defaults to a default filter. Which is quite hard to find to figure out where this is defined (in some other repo called django_user_tasks
.
This is the code place:
https://github.com/openedx/django-user-tasks/blob/20e4e6eb81f5e981d5bbdba390ffcf8e6c3e0d0a/user_tasks/filters.py#L9
So it will be quite hard for a user to understand what the filter actually does or what it's purpose is.
The explaining comments for the default filter are:
Default filter for UserTaskArtifact listings in the REST API.
Ensures that superusers can see all artifacts, but other users
can only see artifacts for tasks they personally triggered.
And then more specific:
Filter out any artifacts which the requesting user does not have permission to view.
On top of that, there are two sets of these filters - one is for task artifacts, one for statuses. They function the exact same way.
This is important to capture on edx-platform in some way. Maybe something like: "These filters can be overridden in django settings of edx-platform. If they are not, the default behavior is the following: ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
Above the line - it's just a comment to add and it's super helpful I feel
dcf94db
to
e9fb603
Compare
@bszabo I updated a lot of the organization in tasks.py. I agree with you on the iffy code practices (using max / min / integer for status), but this is currently how |
Thanks for the editorial changes, Ray. I'm stepping away from this review with the expectation that Jesper will give it a lookover from a functional perspective. If it's possible to attend to the funky status definition before moving on to new things, I would strongly recommend that, even if it ends up being in a different PR. |
If you take a step back, and look at this search for broken links as a first installment towards course optimization, you can see that course optimization will entail a sequence of activities being carried out, with each intended to potentially improvew a course. Viewed that way, the natural questions to ask will be "which activity is currently being worked on?" and "what is the status for activity X?". For the latter question the natural answers will be not started, in progress, succeeded, or failed with error message Y. It seems to me that it would make sense to organize even this first installment somewhat in those lines. The solution you borrowed from import/export is conflating concepts in a way I don't think is good. |
requested_format = request.GET.get('_accept', request.META.get('HTTP_ACCEPT', 'text/html')) | ||
|
||
check_broken_links.delay(request.user.id, course_key_string, request.LANGUAGE_CODE) | ||
return JsonResponse({'LinkCheckStatus': 1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts for readability:
I have the impression that 1 here seems like a magic number. If I look at the code later, it would be hard to figure out what it means, and it would be easy for me when editing the code to return something that is somehow wrong.
I'll need to look in another function to figure out what link check statuses there are according to a comment. And if someone changes the logic but forgets to update that comment, that can lead to bugs.
A pattern I like to use to avoid this problem is to define something similar to an enum.
It could be:
LINK_CHECK_STATUSES = {
"IN_PROGRESS": 1,
"SUCCESS": 3,
...
}
And then here and in other places you can just use it like
return JsonResponse({'LinkCheckStatus': LINK_CHECK_STATUSES["IN_PROGRESS"]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
Above the line, important
You're already overhauling link check statuses so when you've implemented that you can just resolve this discussion
json_content = json.loads(content) | ||
broken_links_dto = _create_dto(json_content, request.user) | ||
elif task_status.state in (UserTaskStatus.FAILED, UserTaskStatus.CANCELED): | ||
status = max(-(task_status.completed_steps + 1), -2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
below the line. Not important.
You can just go with whatever works for you.
I see people often just copying over code from older places since it just works, and thus creating new not-great code. I don't generally think that's a good idea. In my opinion we can extract code that's reused into helper functions if the code is good, but if not, I'd prefer writing new code that is better. It also means the code author needs to think and understand the code a little bit more in-depth than when they copy it.
If you want to change the code but it's too much trouble to extract the function from the other place, I'd say just change it in the new place and then just extract the parts that are still the same
course_key = CourseKey.from_string(course_key_string) | ||
if not has_course_author_access(request.user, course_key): | ||
raise PermissionDenied() | ||
courselike_block = modulestore().get_course(course_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a courselike_block
? Naming isn't that clear to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
nit. Not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer the term "learning context" over "courselike" now. But in this case, I doubt this works with libraries at all, so why not just call it course_root_block
and be super clear?
courselike_block = modulestore().get_course(course_key) | ||
if courselike_block is None: | ||
raise Http404 | ||
context = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this context
variable used anywhere? I may just be blind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
Below the line. Nice to have but not important.
|
||
log = logging.getLogger(__name__) | ||
|
||
STATUS_FILTERS = user_tasks_settings.USER_TASKS_STATUS_FILTERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above "# Tuple containing zero or more filters for UserTaskStatus listing REST API calls." does not help my understanding at all. I understand we apply filters. What do we apply them for?
I went through code quite a bit to figure that out, which was not straightforward.
The result:
This is an adjustable django setting. It specifies general filters that are applied to all tasks before they are returned to the user. Currently this setting is not specified in edx-app settings. So it defaults to a default filter. Which is quite hard to find to figure out where this is defined (in some other repo called django_user_tasks
.
This is the code place:
https://github.com/openedx/django-user-tasks/blob/20e4e6eb81f5e981d5bbdba390ffcf8e6c3e0d0a/user_tasks/filters.py#L9
So it will be quite hard for a user to understand what the filter actually does or what it's purpose is.
The explaining comments for the default filter are:
Default filter for UserTaskArtifact listings in the REST API.
Ensures that superusers can see all artifacts, but other users
can only see artifacts for tasks they personally triggered.
And then more specific:
Filter out any artifacts which the requesting user does not have permission to view.
On top of that, there are two sets of these filters - one is for task artifacts, one for statuses. They function the exact same way.
This is important to capture on edx-platform in some way. Maybe something like: "These filters can be overridden in django settings of edx-platform. If they are not, the default behavior is the following: ..."
cms/djangoapps/contentstore/tasks.py
Outdated
|
||
# catch all exceptions so we can record useful error messages | ||
except Exception as exception: # pylint: disable=broad-except | ||
LOGGER.exception('Error checking links for course %s', courselike_key, exc_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... I see this is done this way in other tasks.
Is this how celery works? No actual error should be raised in the task? Because this swallows all errors.
I understand it does a logger exception. But I'm a bit confused because I don't know if this actually the correct way it should be handled with celery, or whether the error should not be caught so it can cancel the running celery task? Need to look into it.
I don't exactly know how this logger is configured but this way I don't see datadog showing an exception, for example. Maybe if we do swallow this error, we should add some logic to make it show in datadog celery errors? Or do you think the logger exception somehow does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus:
Out of scope. Create issue to look into this.
Quite important but better done separately.
Course Optimizer is a feature approved by the Openedx community that adds a "Course Optimizer" page to studio where users can run a scan of a course for broken links - links that point to pages that have a 404. Depends on backend: openedx/edx-platform#35887 - test together. This also requires adding a nav menu item to edx-platform legacy studio. That should be implemented before enabling the waffle flag on prod. Links: - [Internal JIRA ticket](https://2u-internal.atlassian.net/browse/TNL-11809) - [Course Optimizer Discovery](https://2u-internal.atlassian.net/wiki/spaces/TNL/pages/1426587703/TNL-11744+Course+Optimizer+Discovery) - [Openedx community proposal](openedx/platform-roadmap#388)
Description
This PR creates the backend for Course Optimizer Link Checker, which will scan through a course and check for broken links. This functionality imitates what is currently in place for export. 2 apis are created here:
link_check
POST[block_id, broken_link]
link_check_status
GETlink_check
if process is successful.Technical considerations:
Supporting information
https://2u-internal.atlassian.net/browse/TNL-11782
Testing instructions
The following example is for demo course
course-v1:edX+DemoX+Demo_Course
.export
withlink_check
.http://localhost:18010/link_check_status/course-v1:edX+DemoX+Demo_Course
in your browser. You should see the results of the link check scan.Testing with frontend
frontend-app-authoring
: Feat course optimizer page frontend-app-authoring#1533devstack
, runmake dev.up.large-and-slow
.frontend-app-authoring
, runnpm start
.Other information
frontend-app-authoring
: Feat course optimizer page frontend-app-authoring#1533