New Features
- New
FEED_EXPORT_ENCODING
setting to customize the encoding
used when writing items to a file.
This can be used to turn off\uXXXX
escapes in JSON output.
This is also useful for those wanting something else than UTF-8
for XML or CSV output (#2034). startproject
command now supports an optional destination directory
to override the default one based on the project name (#2005).- New
SCHEDULER_DEBUG
setting to log requests serialization
failures (#1610). - JSON encoder now supports serialization of
set
instances (#2058). - Interpret
application/json-amazonui-streaming
asTextResponse
(#1503). scrapy
is imported by default when using shell tools (shell
,
inspect_response
) (#2248).
Bug fixes
- DefaultRequestHeaders middleware now runs before UserAgent middleware
(#2088). Warning: this is technically backwards incompatible,
though we consider this a bug fix. - HTTP cache extension and plugins that use the
.scrapy
data directory now
work outside projects (#1581). Warning: this is technically
backwards incompatible, though we consider this a bug fix. Selector
does not allow passing bothresponse
andtext
anymore
(#2153).- Fixed logging of wrong callback name with
scrapy parse
(#2169). - Fix for an odd gzip decompression bug (#1606).
- Fix for selected callbacks when using
CrawlSpider
withscrapy parse
(#2225). - Fix for invalid JSON and XML files when spider yields no items (#872).
- Implement
flush()
forStreamLogger
avoiding a warning in logs (#2125).
Refactoring
Tests & Requirements
Scrapy's new requirements baseline is Debian 8 "Jessie". It was previously Ubuntu 12.04 Precise.
What this means in practice is that we run continuous integration tests with these (main) packages versions at a minimum: Twisted 14.0, pyOpenSSL 0.14, lxml 3.4.
Scrapy may very well work with older versions of these packages (the code base still has switches for older Twisted versions for example) but it is not guaranteed (because it's not tested anymore).