最近在学习使用python的Scrapy爬虫框架练习爬取网站,在爬取的时候总是执行失败,具体错误如下:
2017-03-09 13:58:34 [scrapy] INFO: Enabled item pipelines: [] 2017-03-09 13:58:34 [scrapy] INFO: Spider opened 2017-03-09 13:58:34 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-03-09 13:58:34 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/robots.txt>: 'float' object is not iterable Traceback (most recent call last): File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request return handler.download_request(request, spider) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request return agent.download_request(request) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request method, to_bytes(url, encoding='ascii'), headers, bodyproducer) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1631, in request parsedURI.originForm) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint d = self._pool.getConnection(key, endpoint) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1294, in getConnection return self._newConnection(key, endpoint) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection return endpoint.connect(factory) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect EndpointReceiver, self._hostText, portNumber=self._port File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName onAddress = self._simpleResolver.getHostByName(hostName) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable 2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/class/1_1.html> TypeError: 'float' object is not iterable During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request defer.returnValue((yield download_func(request=request,spider=spider))) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request return handler.download_request(request, spider) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request return agent.download_request(request) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request method, to_bytes(url, encoding='ascii'), headers, bodyproducer) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1631, in request parsedURI.originForm) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint d = self._pool.getConnection(key, endpoint) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1294, in getConnection return self._newConnection(key, endpoint) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection return endpoint.connect(factory) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect EndpointReceiver, self._hostText, portNumber=self._port File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName onAddress = self._simpleResolver.getHostByName(hostName) File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable 2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/class/2_1.html> TypeError: 'float' object is not iterable During handling of the above exception, another exception occurred:经过搜索,发现是本地的Twisted库的版本问题(具体可以参见这个)。
而我在本地使用的是anaconda python发行版,在安装Scrapy的时候默认安装的Twisted库是17.1.0。只要把Twisted库降级到16.6.0即可(使用conda install Twisted==16.6.0安装)。