- ThreadBaseScheduler added to improve the performance of scheduler
- robots.txt supported!
- elasticsearch database backend supported!
- new script callback
on_finished
, http://docs.pyspider.org/en/latest/About-Projects/#on_finished-callback - you can now set the delay time between retries:
retry_delay is a dict to specify retry intervals. The items in the dict
are {retried: seconds}, and a special key: '' (empty string) is used to
specify the default retry delay if not specified.
- dict parameters in crawl_config, @config will be merged (e.g. headers), thanks to @ihipop
- add parameter
max_redirects
inself.crawl
to control maximum redirect numbers when doing the fetch, thanks to @AtaLuZiK - add parameter
validate_cert
inself.crawl
to ignore the error of server’s certificate. - new property
etree
for Response,etree
is a cached lxml.html.HtmlElement object, thanks to @waveyeung - you can now pass arguments to phantomjs from command line or config file.
- support for pymongo 3.0
- local.projectdb now accept a glob path (e.g. script/*.py) to load multiple projects from local filesystem.
- queue size in the dashboard is not working for osx, thanks to @xyb
- counters in dashboard will shown for stopped projects
- other bug fix