-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FC-0049] fix: Issues when importing a large taxonomy #189
[FC-0049] fix: Issues when importing a large taxonomy #189
Conversation
Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
* Refactor execute() of CreateTag() to avoid save values on memory * Refactor execute() of TagImportPlan() to avoid save logs in every action. This causes slow execution and memory overflow.
744e5d2
to
afa0977
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @ChrisChV Great work on this! Its working well, here are the timings I got when importing the different taxonomies:
- WGU Instructional Design: K-12 Collection: 1.44 seconds.
- FlatTaxonomy: 14.98 seconds.
- Lightcast Open Skills Taxonomy: 18.47 seconds.
I just had a few nit questions below.
- I tested this: I followed the testing instructions in the PR
- I read through the code
-
I checked for accessibility issues - Includes documentation
parent = None | ||
if self.tag.parent_id: | ||
parent = self.taxonomy.tag_set.get(external_id=self.tag.parent_id) | ||
taxonomy_tag = Tag( | ||
Tag.objects.create( | ||
taxonomy=self.taxonomy, | ||
parent=parent, | ||
parent=self.taxonomy.tag_set.get(external_id=self.tag.parent_id) | ||
if self.tag.parent_id is not None else None, | ||
value=self.tag.value, | ||
external_id=self.tag.id, | ||
) | ||
taxonomy_tag.save() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just curious if this had an impact on improving the performance or is it mainly making the code cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It improves memory a little because it does not create an instance of Tag, but uses the create function. It was the first thing I tried, but it wasn't the root of the problem. I left it like this because it looks cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tag.objects.create(...)
still creates a Tag
instance and calls .save()
on it for you, and returns it. It may be garbage-collected slightly better if you don't keep the returned Tag instance here, but I wouldn't expect any major difference. I do like the create()
version more though - it is cleaner.
taxonomy_tag = self._get_tag() | ||
taxonomy_tag.delete() | ||
self._get_tag().delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similar to the above, if this had any performance improvement or just code style changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
Description
Supporting information
It has been tested with taxonomies that take too long or could not finish due to a memory backup. These are their average times after fixing the issue:
WGU Instructional Design: K-12 Collection
: 5.95 seconds.FlatTaxonomy
: 65.55 seconds.Lightcast Open Skills Taxonomy
: 76.5 seconds.Testing instructions
edx-platform
: [FC-0049] chore: Bump version of openedx-learning for taxonomy import improvements edx-platform#34796make requirements
on cms-shellWGU Instructional Design
FlatTaxonomy
Lightcast Open Skills Taxonomy