This release is heavily focused on improvements to Microsoft Excel support.
The underlying Excel library has been changed from tealeg/xlsx
to qax-os/excelize
, largely because tealeg/xlsx
is no longer actively maintained.
Thus, both the XLSX output writer and the XLSX driver have been rewritten. There should be some performance improvements, but it's also possible that the rewrite introduced bugs. If you discover anything strange, please open an issue.
Added
-
#99: The CSV and XLSX
drivers can now handle duplicate header column names in the ingest data.
For example, given a CSV file:actor_id,first_name,actor_id 1,PENELOPE,1 2,NICK,2
The columns will be renamed to:
actor_id,first_name,actor_id_1
The renaming behavior is controlled by a new option
ingest.column.rename
This new option is effectively the ingest counterpart of the existing output option
result.column.rename
. -
#191: The XLSX driver now detects header rows, like
the CSV driver already does. Thus, you now typically don't need to specify
the--ingest.header
flag for Excel files. However, the option remains available
in casesq
can't figure it out for a particular file. -
The Excel writer has three new config options for controlling date/time output.
Note that these format strings are distinct fromformat.datetime
and friends, because Excel has its own format string mechanism.format.excel.datetime
: Controls datetime format, e.g.2023-08-03 16:07:01
.format.excel.date
: Controls date-only format, e.g.2023-08-03
.format.excel.time
: Controls time-only format, e.g.4:07 pm
.
-
The ingest kind detectors (e.g. for
CSV
orXLSX
)
now detect more date & time formats askind.Datetime
,kind.Date
, andkind.Time
. -
If an error occurs when the output format is
text
, a stack trace is printed
tostderr
when the command is executed with--verbose
(-v
). -
There's a new option
error.format
that controls error output format independent
of the mainformat
option . Theerror.format
value must be one oftext
orjson
.
Changed
-
☢️ The default Excel date format has changed. Previously
the format was11/9/89
, and now it is1989-11-09
. The same applies
to datetimes, e.g.11/9/1989 00:00:00
becomes1989-11-09 00:00
.This change is made to reduce ambiguity and confusion.
sq
uses a library
to interact with Excel files, and it seems that the library chooses a particular format
by default (11/9/89
). There are several paths we could take here:- Interrogate the OS, and use the OS locale date format.
- Stick with the library default
11/9/89
. - Pick a default other than
11/9/89
.
We pick the third option. The first option (locale-dependent)
is excluded because, as a general rule, we wantsq
to produce the same
output regardless of locale/system settings. We exclude the second option
because month/day confuses most of the world. Thus, we're left with picking a
default, and1989-11-09
is the format used in
RFC3339 and friends.Whether this is the correct (standard?) approach is still unclear, and
feedback is welcome. However, the user can make use of the new config options
(format.excel.datetime
etc.)
to customize the format as they see fit. -
The XLSX writer now outputs header rows in bold text.
-
☢️ The XLSX writer now outputs blob (
bytes
) cell data as a base64-encoded string,
instead of raw bytes.
Fixed
- Fixed bug where source-specific config wasn't being propagated.