We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
this specific website is throwing an exception I can't understand.
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
it results in the splash docker container hanging. It becomes unresponsive to all future requests. More verbose logs didn't reveal any more info
The logs
(.venv) C:\Users\me\path\to\project>docker run -p 8050:8050 scrapinghub/splash:latest 2024-06-25 20:39:41+0000 [-] Log opened. 2024-06-25 20:39:41.947216 [-] Xvfb is started: ['Xvfb', ':769163157', '-screen', '0', '1024x768x24', '-nolisten', 'tcp'] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-splash' 2024-06-25 20:39:42.012362 [-] Splash version: 3.5 2024-06-25 20:39:42.045852 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2 2024-06-25 20:39:42.046036 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] 2024-06-25 20:39:42.046099 [-] Open files limit: 1048576 2024-06-25 20:39:42.046140 [-] Can't bump open files limit 2024-06-25 20:39:42.061355 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles 2024-06-25 20:39:42.061513 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled 2024-06-25 20:39:42.170427 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0 2024-06-25 20:39:42.170695 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled 2024-06-25 20:39:42.171427 [-] Site starting on 8050 2024-06-25 20:39:42.171615 [-] Starting factory <twisted.web.server.Site object at 0x7f96c40ae5c0> 2024-06-25 20:39:42.172103 [-] Server listening on http://0.0.0.0:8050 QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
Minimum replication
import scrapy from scrapy.crawler import CrawlerProcess from scrapy_splash import SplashRequest class ResearchSpider(scrapy.Spider): name = "research_spider" custom_settings = { 'SPLASH_URL': 'http://localhost:8050', 'ROBOTSTXT_OBEY': True, 'DOWNLOAD_DELAY': 2, "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36", 'DOWNLOADER_MIDDLEWARES': { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }, 'SPIDER_MIDDLEWARES': { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', } def start_requests(self): for url in self.start_urls: yield SplashRequest( url, self.parse ) def parse(self, response): print(f"parsing {response.url=}") def crawl_process(websites: list[str]): print(f"Initializing crawler process - {websites=}") process = CrawlerProcess() process.crawl(ResearchSpider, start_urls=websites) process.start() print(f"Completed crawl") if __name__ == "__main__": crawl_process([ "http://www.crazyplumbers.com/", ])
The text was updated successfully, but these errors were encountered:
No branches or pull requests
this specific website is throwing an exception I can't understand.
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
it results in the splash docker container hanging. It becomes unresponsive to all future requests. More verbose logs didn't reveal any more info
The logs
Minimum replication
The text was updated successfully, but these errors were encountered: