Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix minor issues with Immoscout24 image extraction #517

Merged
merged 3 commits into from
Jan 22, 2024

Conversation

ngdio
Copy link

@ngdio ngdio commented Jan 19, 2024

This pull request fixes two minor issues I've encountered in the log while scraping Immoscout24 results.

777ad0c: Some images on scraped listings are rejected by Telegram for being Webp files, as they fall through the filter that shortens them to just the .jpg link. Those are the images used to link to virtual viewings and are apparently just basic placeholders. They're recognisable by the @xsi.type property being set to common:VirtualTour so it's easiest just to exclude everything but common:Picture altogether.

c1c6c4f: In some cases, the same filter shortened image links to just a few characters, which failed to be submitted to Telegram as a result. I found this is due to some images on Immoscout carrying the .jpeg extension, with an extra e over .jpg. The filter had to take that into account so I built it as a regex into the jsonpath filter.

Neither of these issues was fatal, but this should result in more images successfully scraped and sent as well as less error messages and unsuccessful API calls to notification platforms.

@codders
Copy link

codders commented Jan 22, 2024

Thanks so much - this looks great!

Copy link

@codders codders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@codders codders merged commit 77a1b57 into flathunters:main Jan 22, 2024
4 checks passed
@ngdio ngdio deleted the fix_immoscout_image_extraction branch February 5, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants