Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the LWN weekly publication date #2589

Merged
merged 1 commit into from
Dec 30, 2024
Merged

Conversation

smipi1
Copy link

@smipi1 smipi1 commented Dec 29, 2024

Borrows a trick from foreignaffairs.recipe to set the publication date in the title as opposed to the conversion date. Bulk conversion (unfortunately) is a frequent thing when catching up after a holiday or crontab breakage. It is really annoying figuring out the LWN weekly edition reading order if they all have the same title based on the conversion date.

The correct way to implement this would have been to override publication_date(), but this is not really set-up to be able to derive the publication date from the content. The workaround in foreignaffairs.recipe overrides timefmt with the publication date string to get around this. I opted for the same approach.

@kovidgoyal
Copy link
Owner

Why not just store the datetime object on self in parse_index and return
it from publication_date() ?

@smipi1
Copy link
Author

smipi1 commented Dec 29, 2024

I have not been able to prove to myself that parse_index() will always be called prior to publication_date(). I have also not found any such guarantees in the documentation. If it is indeed something I can rely on, can you please point me to the right documentation and code?

@kovidgoyal
Copy link
Owner

publication_date() is called only by create_opf() OPF creation happens
after downloading, downloading starts with parse_index. See the
build_index() function.

@smipi1
Copy link
Author

smipi1 commented Dec 30, 2024

Hi @kovidgoyal ,

Implemented in accordance with your suggestion. Because the date now gets parsed, I had to temporarily switch the locale to en_US to prevent breakage in some countries.

@kovidgoyal
Copy link
Owner

Changing global variables in a multithreaded context is pretty dangerous, instead just parse the date manually, it should be trivial to do in 3 lines of python for a known date format. Something like:

def parse_date_with_custom_mapping(date_string):
    month_map = {
        "January": 1, "February": 2, "March": 3, "April": 4,
        "May": 5, "June": 6, "July": 7, "August": 8,
        "September": 9, "October": 10, "November": 11, "December": 12
    }

    month_name, day_year = date_string.split()
    day, year = map(int, day_year.split(','))
    
    return datetime(year, month_map[month_name], day)

@smipi1
Copy link
Author

smipi1 commented Dec 30, 2024

Thanks for pointing this out! I rarely build multi-threaded Python applications. I mostly opt for asyncio workqueues, which tend to provide demonstrably better throughput (the only exception being workloads that require little to no synchronization between threads).

However, building a custom date parser to avoid a thread safety issue is bad for maintainability? The software landscape is littered with the corpses of those that implement bespoke date / time infrastructure. I will rather solve the thread safety issue and use standard infrastructure for the parsing.

@kovidgoyal
Copy link
Owner

This is a pretty trivial string to parse but if you dont want to write
your own use dateutil.parser which IIRC is locale independent.

@smipi1
Copy link
Author

smipi1 commented Dec 30, 2024

That is exactly what I did. I just switched to dateutil.parser.parse() which is both location independent, thread-safe, and already used by calibre.utils.date. I opted not to switch to use calibre.utils.date.parse_date because there is no need for TZ shenanigans.

Bulk conversion (unfortunately) is a frequent thing when catching up
after a holiday or crontab breakage. It is really annoying figuring out
the LWN weekly edition reading order if they all have the same title
based on the conversion date.

Falls back to the current date if the date cannot be parsed.

Uses dateutil.parser.parse() for thread-safety.
@kovidgoyal kovidgoyal merged commit d1572df into kovidgoyal:master Dec 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants