Improvements
- Support reading and writing datetimes with timezones (#253).
- Support writing dataframes without geometry column (#267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used inread_info
,read
,read_dataframe
) (#271). - Add
arrow_to_pandas_kwargs
parameter toread_dataframe
+ reduce memory usage
withuse_arrow=True
(#273) - In
read_info
, the result now also contains thetotal_bounds
of the layer as well
as some extracapabilities
of the data source driver (#281). - Raise error if
read
orread_dataframe
is called with parameters to read no
columns, geometry, or fids (#280). - Automatically detect supported driver by extension for all available
write drivers and addition ofdetect_write_driver
(#270). - Addition of
mask
parameter toopen_arrow
,read
,read_dataframe
,
andread_bounds
functions to select only the features in the dataset that
intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0. - Removed warning when no features are read from the data source (#299).
- Add support for
force_2d=True
withuse_arrow=True
inread_dataframe
(#300).
Other changes
-
test suite requires Shapely >= 2.0
-
using
skip_features
greater than the number of features available in a data
layer now returns empty arrays forread
and an empty DataFrame for
read_dataframe
instead of raising aValueError
(#282). -
enabled
skip_features
andmax_features
forread_arrow
and
read_dataframe(path, use_arrow=True)
. Note that this incurs overhead
because all features up to the next batch size abovemax_features
(or size
of data layer) will be read prior to slicing out the requested range of
features (#282). -
The
use_arrow=True
option can be enabled globally for testing using the
PYOGRIO_USE_ARROW=1
environment variable (#296).
Bug fixes
- Fix int32 overflow when reading int64 columns (#260)
- Fix
fid_as_index=True
doesn't set fid as index usingread_dataframe
with
use_arrow=True
(#265) - Fix errors reading OSM data due to invalid feature count and incorrect
reading of OSM layers beyond the first layer (#271) - Always raise an exception if there is an error when writing a data source
(#284)
Potentially breaking changes
- In
read_info
(#281):- the
features
property in the result will now be -1 if calculating the
feature count is an expensive operation for this driver. You can force it to be
calculated using theforce_feature_count
parameter. - for boolean values in the
capabilities
property, the values will now be
booleans instead of 1 or 0.
- the
Packaging
- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.