#42 import documents #57

nickmoreton · 2021-10-22T17:03:17Z

No description provided.

nimasmi

This MR has a lot in it which seems unrelated to the "import documents" ticket. Can we do something to make it smaller, so it can be fully reviewed?

nimasmi · 2021-10-25T11:00:47Z

wagtail_wordpress_import/block_builder_defaults.py

+    cache = image_linker(cache)
+    cache = document_linker(cache)


Comment: It's interesting that we call this variable "cache", but that the functions in question call it html, and take and return a string. The name "cache" confused me because it suggests either a dict or an API to a cache backend.

Nonetheless why don't we call it html here?

Admittedly, I don't yet have the full picture.

nimasmi · 2021-10-25T11:02:00Z

wagtail_wordpress_import/block_builder_defaults.py

@@ -288,3 +328,61 @@ def get_alignment_class(image):
            alignment = "right"

    return alignment
+
+
+def document_linker(html):


Comment: These functions, image_linker, and document_linker, both take and return a string, but parse it with BeautifulSoup. I wonder whether my suggestion the other day that a function should both take and return a soup object was misguided, because it leads to inconsistency within the app. Or whether these should change for the same reason (better performance if we do less parsing and exporting).

Ref. #48 (comment)

Are you OK with me adding a ticket for this. The first time we get some soup is with the lxml parser and that's the soup we have just before it's passed to this function so we pass a string of the elements.

The first time we get some soup is with the lxml parser.

Are you saying that the soups are different because one uses lxml and another uses html.parser? If so, then that's a strong argument for passing around string HTML objects instead of soup instances.

I favour readability and consistency at this stage, especially as we aren't yet aware of any performance concern.

#67 Where we should look at this in more detail.

nimasmi · 2021-10-25T11:07:41Z

wagtail_wordpress_import/block_builder_defaults.py

+        string: the html with img tags modified
+
+    BS4 performs a find and replace on all img tags found in the HTML.
+    If the image can be retrived from the remote site and saved into a Wagtail ImageModel


Suggested change

If the image can be retrived from the remote site and saved into a Wagtail ImageModel

If the image can be retrieved from the remote site and saved into a Wagtail ImageModel

nimasmi · 2021-10-25T12:12:04Z

wagtail_wordpress_import/test/tests/test_block_builder.py

+        self.assertEqual(get_image_file_name("folder/fakeimage.jpg"), "fakeimage.jpg")
+        self.assertEqual(
+            get_image_file_name(
+                "http://www.example.com/folder1/folder2//fakeimage.jpg"


Question: Is the double forward slash intentional?

nimasmi · 2021-10-25T12:29:45Z

wagtail_wordpress_import/block_builder_defaults.py

+    return r, status, content_type
+
+
+def get_abolute_src(src, domain_prefix=None):


Suggestion Typo, used throughout this MR.

Suggested change

def get_abolute_src(src, domain_prefix=None):

def get_abolute_src(src, domain_prefix=None):

nimasmi · 2021-10-25T12:32:49Z

wagtail_wordpress_import/block_builder_defaults.py

+
+
+def get_abolute_src(src, domain_prefix=None):
+    src = re.sub("^\/+", "", src)


Question: Is this just the same as src.lstrip("/")? That would be more readable.

nimasmi · 2021-10-25T12:34:27Z

wagtail_wordpress_import/test/tests/test_block_builder.py

+        self.assertEqual(
+            get_abolute_src("folder/fakeimage.jpg"),
+            "folder/fakeimage.jpg",
+        )  # the test settings has no BASE_URL setting so try having no domain prefix


Suggestion: These seem like distinct test cases. Can we configure them thus?

nimasmi · 2021-10-25T12:36:32Z

wagtail_wordpress_import/test/tests/test_block_builder.py

+        input = get_soup(
+            '<img src="fakeimage.jpg" alt="image alt" class="align-left" />',
+            "html.parser",
+        ).find("img")
+        self.assertEqual(get_alignment_class(input), "left")
+        input = get_soup(
+            '<img src="fakeimage.jpg" alt="image alt" class="align-right" />',
+            "html.parser",
+        ).find("img")
+        self.assertEqual(get_alignment_class(input), "right")
+        input = get_soup(
+            '<img src="fakeimage.jpg" alt="image alt" />',
+            "html.parser",
+        ).find("img")
+        self.assertEqual(get_alignment_class(input), "fullwidth")


Suggestion: These are three different tests.

Suggestion: Try to avoid using Python keywords as variables. Maybe call it "tag".

nimasmi · 2021-10-25T12:40:27Z

wagtail_wordpress_import/test/tests/test_block_builder.py

+    # work in progress
+    def test_with_real_image(self):
+        # but we need to test with mocked images if we can.
+        raw_html_file = """
+        <p>Lorem <img src="https://dummyimage.com/600x400/000/fff" alt=""></p>
+        """
+        self.builder = BlockBuilder(raw_html_file, None, None)
+        self.builder.promote_child_tags()
+        self.blocks = self.builder.build()
+        self.assertTrue("<embed" in self.blocks[0]["value"])
+
+    # work in progress
+    def test_with_real_pdf(self):
+        # but we need to test with mocked files if we can.
+        raw_html_file = """
+        <p>
+        <a href="https://file-examples-com.github.io/uploads/2017/10/file-sample_150kB.pdf">A pdf</a>
+        </p>
+        """
+        self.builder = BlockBuilder(raw_html_file, None, None)
+        self.builder.promote_child_tags()
+        self.blocks = self.builder.build()
+        self.assertTrue('linktype="document"' in self.blocks[0]["value"])


Question: What's the plan with these work-in-progress test cases? Can we remove them if they're not ready yet?

It's because we should really be testing without needed to use external files. I opened a ticket to look at this at some point rather than hold the PR's up. https://projects.torchbox.com/projects/wordpress-to-wagtail-importer-package/tickets/76

nimasmi · 2021-10-25T12:43:36Z

docs/yoast.md

@@ -0,0 +1,37 @@
+- [Yoast plugin compatibility](#yoast-plugin-compatibility)


Question: Why is there a Yoast documentation file in this MR? It's not related.

testing using live url rather than local files. will need more work here

code review changes

nickmoreton changed the base branch from main to integration/sprint-6 October 25, 2021 09:51

nickmoreton marked this pull request as ready for review October 25, 2021 09:51

nimasmi reviewed Oct 25, 2021

View reviewed changes

nickmoreton force-pushed the #42-import-documents branch from d3b2574 to d882e43 Compare October 28, 2021 20:22

nickmoreton requested a review from kaedroho October 28, 2021 21:00

This was referenced Oct 29, 2021

Refactor blockbuilder #63

Closed

Refactor/block builder #64

Merged

nickmoreton force-pushed the #42-import-documents branch from 07c740a to 6657bf7 Compare November 1, 2021 10:53

nickmoreton self-assigned this Nov 1, 2021

nickmoreton removed the request for review from kaedroho November 1, 2021 11:04

nickmoreton added 3 commits November 1, 2021 15:18

add document fetcher methods

50aeefb

add test for document linking

4f470c0

testing using live url rather than local files. will need more work here

add override_settings import back in

abe9135

code review changes

nickmoreton force-pushed the #42-import-documents branch from 6657bf7 to abe9135 Compare November 1, 2021 15:22

nickmoreton merged commit 79152ac into integration/sprint-6 Nov 2, 2021

nickmoreton mentioned this pull request Nov 2, 2021

Should/can we pass around strings rather than soup objects? #67

Open

nickmoreton deleted the #42-import-documents branch November 19, 2021 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#42 import documents #57

#42 import documents #57

nickmoreton commented Oct 22, 2021

nimasmi left a comment

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nickmoreton Nov 1, 2021

nimasmi Nov 1, 2021

nickmoreton Nov 2, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nimasmi Oct 25, 2021

nickmoreton Nov 1, 2021

nimasmi Oct 25, 2021

	If the image can be retrived from the remote site and saved into a Wagtail ImageModel
	If the image can be retrieved from the remote site and saved into a Wagtail ImageModel

		return r, status, content_type


		def get_abolute_src(src, domain_prefix=None):



		def get_abolute_src(src, domain_prefix=None):
		src = re.sub("^\/+", "", src)

		@@ -0,0 +1,37 @@
		- [Yoast plugin compatibility](#yoast-plugin-compatibility)

#42 import documents #57

#42 import documents #57

Conversation

nickmoreton commented Oct 22, 2021

nimasmi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment