github neilotoole/sq v0.41.0

latest releases: v0.48.3, v0.48.2, v0.48.1...
15 months ago

This release is heavily focused on improvements to Microsoft Excel support.

The underlying Excel library has been changed from tealeg/xlsx to qax-os/excelize, largely because tealeg/xlsx is no longer actively maintained.

Thus, both the XLSX output writer and the XLSX driver have been rewritten. There should be some performance improvements, but it's also possible that the rewrite introduced bugs. If you discover anything strange, please open an issue.

Added

  • #99: The CSV and XLSX
    drivers can now handle duplicate header column names in the ingest data.
    For example, given a CSV file:

    actor_id,first_name,actor_id
    1,PENELOPE,1
    2,NICK,2
    

    The columns will be renamed to:

    actor_id,first_name,actor_id_1
    

    The renaming behavior is controlled by a new option ingest.column.rename
    This new option is effectively the ingest counterpart of the existing output option
    result.column.rename.

  • #191: The XLSX driver now detects header rows, like
    the CSV driver already does. Thus, you now typically don't need to specify
    the --ingest.header flag for Excel files. However, the option remains available
    in case sq can't figure it out for a particular file.

  • The Excel writer has three new config options for controlling date/time output.
    Note that these format strings are distinct from format.datetime
    and friends, because Excel has its own format string mechanism.

  • The ingest kind detectors (e.g. for CSV or XLSX)
    now detect more date & time formats as kind.Datetime, kind.Date, and kind.Time.

  • If an error occurs when the output format is text, a stack trace is printed
    to stderr when the command is executed with --verbose (-v).

  • There's a new option error.format that controls error output format independent
    of the main format option . The error.format value must be one of text or json.

Changed

  • ☢️ The default Excel date format has changed. Previously
    the format was 11/9/89, and now it is 1989-11-09. The same applies
    to datetimes, e.g. 11/9/1989 00:00:00 becomes 1989-11-09 00:00.

    This change is made to reduce ambiguity and confusion.
    sq uses a library
    to interact with Excel files, and it seems that the library chooses a particular format
    by default (11/9/89). There are several paths we could take here:

    1. Interrogate the OS, and use the OS locale date format.
    2. Stick with the library default 11/9/89.
    3. Pick a default other than 11/9/89.

    We pick the third option. The first option (locale-dependent)
    is excluded because, as a general rule, we want sq to produce the same
    output regardless of locale/system settings. We exclude the second option
    because month/day confuses most of the world. Thus, we're left with picking a
    default, and 1989-11-09 is the format used in
    RFC3339 and friends.

    Whether this is the correct (standard?) approach is still unclear, and
    feedback is welcome. However, the user can make use of the new config options
    (format.excel.datetime etc.)
    to customize the format as they see fit.

  • The XLSX writer now outputs header rows in bold text.

  • ☢️ The XLSX writer now outputs blob (bytes) cell data as a base64-encoded string,
    instead of raw bytes.

Fixed

  • Fixed bug where source-specific config wasn't being propagated.

Don't miss a new sq release

NewReleases is sending notifications on new releases.