Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FC-0049] fix: Issues when importing a large taxonomy #189

Conversation

ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented May 9, 2024

Description

  • Adds time elapsed to import log.
  • Fix issues with memory and time when importing a large taxonomy.

Supporting information

It has been tested with taxonomies that take too long or could not finish due to a memory backup. These are their average times after fixing the issue:

  • WGU Instructional Design: K-12 Collection: 5.95 seconds.
  • FlatTaxonomy: 65.55 seconds.
  • Lightcast Open Skills Taxonomy: 76.5 seconds.

Testing instructions

from openedx_tagging.core.tagging.models.import_export import TagImportTask
task = list(TagImportTask.objects.all())[-1]
task.log

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label May 9, 2024
@openedx-webhooks
Copy link

openedx-webhooks commented May 9, 2024

Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@ChrisChV ChrisChV marked this pull request as draft May 10, 2024 16:47
@ChrisChV ChrisChV changed the title fix: Issues when importing a large taxonomy [FC-0049] fix: Issues when importing a large taxonomy May 10, 2024
ChrisChV added 4 commits May 14, 2024 11:00
* Refactor execute() of CreateTag() to avoid save values on memory
* Refactor execute() of TagImportPlan() to avoid save logs in every action. This causes slow execution and memory overflow.
@ChrisChV ChrisChV force-pushed the chris/FAL-3721-issues-importing-large-taxonomy branch from 744e5d2 to afa0977 Compare May 14, 2024 16:01
@ChrisChV ChrisChV marked this pull request as ready for review May 14, 2024 17:10
Copy link
Member

@yusuf-musleh yusuf-musleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @ChrisChV Great work on this! Its working well, here are the timings I got when importing the different taxonomies:

  • WGU Instructional Design: K-12 Collection: 1.44 seconds.
  • FlatTaxonomy: 14.98 seconds.
  • Lightcast Open Skills Taxonomy: 18.47 seconds.

I just had a few nit questions below.

  • I tested this: I followed the testing instructions in the PR
  • I read through the code
  • I checked for accessibility issues
  • Includes documentation

Comment on lines -247 to 254
parent = None
if self.tag.parent_id:
parent = self.taxonomy.tag_set.get(external_id=self.tag.parent_id)
taxonomy_tag = Tag(
Tag.objects.create(
taxonomy=self.taxonomy,
parent=parent,
parent=self.taxonomy.tag_set.get(external_id=self.tag.parent_id)
if self.tag.parent_id is not None else None,
value=self.tag.value,
external_id=self.tag.id,
)
taxonomy_tag.save()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just curious if this had an impact on improving the performance or is it mainly making the code cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It improves memory a little because it does not create an instance of Tag, but uses the create function. It was the first thing I tried, but it wasn't the root of the problem. I left it like this because it looks cleaner.

Copy link
Contributor

@bradenmacdonald bradenmacdonald May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag.objects.create(...) still creates a Tag instance and calls .save() on it for you, and returns it. It may be garbage-collected slightly better if you don't keep the returned Tag instance here, but I wouldn't expect any major difference. I do like the create() version more though - it is cleaner.

Comment on lines -412 to +409
taxonomy_tag = self._get_tag()
taxonomy_tag.delete()
self._get_tag().delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to the above, if this had any performance improvement or just code style changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

openedx_tagging/core/tagging/import_export/import_plan.py Outdated Show resolved Hide resolved
@bradenmacdonald bradenmacdonald merged commit d229afb into openedx:main May 17, 2024
12 checks passed
@openedx-webhooks
Copy link

@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@bradenmacdonald bradenmacdonald deleted the chris/FAL-3721-issues-importing-large-taxonomy branch May 17, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants