Firefox <HR> disrupting parsing #10

simone-cominato · 2024-07-28T14:51:38Z

Firefox and its forks use the <HR> tag to introduce a line separator between bookmarks and this prevent the parser from working correctly. Chrome and its forks do not introduce such a tag as far as I can tell.
A possible solution could be to remove each <HR> tag before parsing e.g. in parser.py create the following function:

# remember to import re

def __remove_hr_tags(html_lines):
    # Compile the regex pattern for matching <HR> tags (case-insensitive)
    hr_pattern = re.compile(r'<hr[^>]*>', re.IGNORECASE)
    
    # Process each line
    cleaned_lines = []
    for line in html_lines:
        # Remove <HR> tags from the line
        cleaned_line = hr_pattern.sub('', line)
        cleaned_lines.append(cleaned_line)
    
    return cleaned_lines

And then, in parse() function:

def parse(netscape_bookmarks_file: NetscapeBookmarksFile):
    """
    Responsible to start parsing, getting metadata information
    and start the folder recursion
    :param netscape_bookmarks_file: a NetscapeBookMarkFile
    :return: the NetscapeBookMarkFile, but parsed
    """
    line_num = 0
    file = netscape_bookmarks_file
    lines = netscape_bookmarks_file.html.splitlines()

    # Remove the <HR> tag
    lines = __remove_hr_tags(lines)
    
    # rest of the code...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firefox <HR> disrupting parsing #10

Firefox <HR> disrupting parsing #10

simone-cominato commented Jul 28, 2024

Firefox <HR> disrupting parsing #10

Firefox <HR> disrupting parsing #10

Comments

simone-cominato commented Jul 28, 2024