You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, all the profile images are assumed to be JPEGs. However, there are a lot of PNGs and a few GIFs. I thought I had a fix for this, but unfortunately it assumes the images are downloaded in extract_all.py, which is run before downloading images. As the chapter pages don’t give any details about image type, I think we will be unable to figure out internal file name until after the images are downloaded.
There was previously some discussion here about using the MIME type the server provides or determining it ourselves. At the time, I thought the drawback to using the server-provided MIME type was adding another stage, but since that looks inevitable now, I think that is actually the best approach. Both add an extra dependency, but the one for using the MIME type is smaller; additionally, the server-provided MIME type is probably faster.
The text was updated successfully, but these errors were encountered:
No, the version I mentioned does not. However, if we create another folder, web_headers, we can have wget use the HEAD method and save headers there. The problem with that approach is it would require an extra request for each image, which might be less nice. Since HEAD requests are so small, though, I think that’s probably the best option.
I don’t know about the HTML mirror, but because epubs store MIME type separately from just the file extension, it would be possible to just use an image file name without an extension. If that works for the HTML mirror too, it would save time by avoididng the image_parse stage I said might become necessary.
Currently, all the profile images are assumed to be JPEGs. However, there are a lot of PNGs and a few GIFs. I thought I had a fix for this, but unfortunately it assumes the images are downloaded in
extract_all.py
, which is run before downloading images. As the chapter pages don’t give any details about image type, I think we will be unable to figure out internal file name until after the images are downloaded.There was previously some discussion here about using the MIME type the server provides or determining it ourselves. At the time, I thought the drawback to using the server-provided MIME type was adding another stage, but since that looks inevitable now, I think that is actually the best approach. Both add an extra dependency, but the one for using the MIME type is smaller; additionally, the server-provided MIME type is probably faster.
The text was updated successfully, but these errors were encountered: