Fix memory issue: don't copy recordbatches in memory during a table deepcopy #2291 (@lhoestq)
This affected methods like concatenate_datasets
, multiprocessed map
and load_from_disk
.
Breaking change:
- when using
Dataset.map
with theinput_columns
parameter, the resulting dataset will only have the columns frominput_columns
and the columns added by the map functions. The other columns are discarded.