github binux/pyspider v0.3.0
First PyPI Release

latest releases: v0.3.10, v0.3.9, v0.3.8...
9 years ago
  • A lot of bug fixed.
  • Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
  • Python 3 support!
  • Use click to create a better command line interface.
  • Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
  • Benchmark test.
  • Documentation & tutorial: http://docs.pyspider.org/
  • Flake8 cleanup (thanks to @jtwaleson)

Base

  • Use messagepack instead of pickle in message queue.
  • JSON data will encoding as base64 string when content is binary.
  • Rabbitmq lazy limit for better performance.

Scheduler

  • Never re-crawl a task with a negative age.

Fetcher

  • proxy parameter support ip:port format.
  • increase default fetcher poolsize to 100.
  • PhantomJS will return JS script result in Response.js_script_result.

Processor

  • Put multiple new tasks in one package. performance for rabbitmq.
  • Not store all of the headers when success.

Script

  • Add an interface to generate taskid with task object. get_taskid
  • Task would be de-duplicated by project and taskid.

Webui

  • Project list sortable.
  • Return 404 page when dump a not exists project.
  • Web preview support image

Don't miss a new pyspider release

NewReleases is sending notifications on new releases.