- A lot of bug fixed.
- Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
- Python 3 support!
- Use click to create a better command line interface.
- Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
- Benchmark test.
- Documentation & tutorial: http://docs.pyspider.org/
- Flake8 cleanup (thanks to @jtwaleson)
Base
- Use messagepack instead of pickle in message queue.
- JSON data will encoding as base64 string when content is binary.
- Rabbitmq lazy limit for better performance.
Scheduler
- Never re-crawl a task with a negative age.
Fetcher
proxy
parameter supportip:port
format.- increase default fetcher poolsize to 100.
- PhantomJS will return JS script result in
Response.js_script_result
.
Processor
- Put multiple new tasks in one package. performance for rabbitmq.
- Not store all of the headers when success.
Script
- Add an interface to generate taskid with task object.
get_taskid
- Task would be de-duplicated by project and taskid.
Webui
- Project list sortable.
- Return 404 page when dump a not exists project.
- Web preview support image