Crawl-me is a light-weight fast plugin based web picture crawler. You can download your favorite pictures via the plugin if the website is supported. For now, the plugins include gamersky and pixiv. If you want to contribute, please just feel free to contact with me.
Fork me on Github :) https://github.com/nyankosama/crawl-me
- Crawl-me core supports muti-thread downloading using http range-headers, so it's very fast.
- It's plugin based, so you can free add any plugin you want.
- pixiv : This plugin allows you to download any author's paintings in pixiv site.
- gamersky : This plugin supports downloading all pictures in special topic from gamersky site.
Make sure you have already installed python2.7 and pip.
Due to the fact that package relies on lxml, if your platform is linux, please make sure you have installed lib libxslt-devel libxml2-devel. And for windows please select a suitable lxml installer to install.
And then:
$ pip install crawl-me
For windows, please add {$python-home}/Scripts/ to systempath
Install the prerequisite library first:
sudo apt-get install libxml2-dev
sudo apt-get install libxslt1-dev
And then you should install setuptools in order to run the setup.py file
sudo apt-get install python-setuptools
Finally, git clone the source, and install:
$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install
Make sure you have already installed python2.7 and pip
You can install python2.7 via windows installer. You can install pip via downloading the get-pip.py, and run it via python:
python get-pip.py
And then install the prerequisite library lxml. please select a suitable lxml installer to install.
Finally git clone the source, and install:
$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install
For windows, please add {$python-home}/Scripts/ to systempath
-
Download 10 pages pictures at the url of http://www.gamersky.com/ent/201404/352055.shtml in gamersky site, and store the pictures into local direcotry.
crawl-me gamersky http://www.gamersky.com/ent/201404/352055.shtml ./gamersky-crawl 1 10 crawl-me ffft http://www.5442.com/meinv/20160324/30633.html ./ffft 1 10 crawl-me mm131 http://www.mm131.com/xinggan/ ./mm131 1 10
-
Download all the paintings of 藤原(Fujiwara, Pixiv ID=27517), and store them into local directory.
crawl-me pixiv 27517 ./pixiv-crawl <your pixiv loginid> <your password>
-
general help
$ crawl-me -h usage: crawl-me [-h] plugin positional arguments: plugin plugin the crawler uses optional arguments: -h, --help show this help message and exit available plugins: ----gamersky ----pixiv
-
gamersky
$ crawl-me gamersky -h usage: crawl-me [-h] plugin url savePath beginPage endPage positional arguments: plugin plugin the crawler uses url your url to crawl savePath the path where the imgs ars saved beginPage the page where we start crawling endPage the page where we end crawling optional arguments: -h, --help show this help message and exit
-
pixiv
$ crawl-me pixiv -h usage: crawl-me [-h] plugin authorId savePath pixivId password positional arguments: plugin plugin the crawler uses authorId the author id you want to crawl savePath the path where the imgs ars saved pixivId your pixiv login id password your pixiv login password optional arguments: -h, --help show this help message and exit
-
Functions:
- support breakpoint resume
-
Plugins:
- qq zone