pypi DataProfiler 0.4.2
v0.4.2

latest releases: 0.13.4, 0.13.3, 0.13.2...
4 years ago

Runtime Changes

Notes

This update reduces runtime by on average 50%.

Profiler

  • Add support for HistogramOptions
  • Add multiprocessing support
  • Reduced runtime for shuffling indices
  • Vectorized precision function
  • Improved unique set & vocab merging
  • By default histogram only runs 'auto' bin edge detection

Data

  • Add length attribute to the data class data.length() or len(data)

Report

  • Added optional omit_keys to the report options function, remove keys from the final report
  • Added row_has_null_count (global), one or more nulls in the row
  • Added row_is_null_count (global), the entire row is null
  • Rename total_samples (global) -> row_count
  • Rename label BACKGROUND -> UNKNOWN (column)
  • Removed covariance (global)
  • Removed data_classification (global)
  • Removed data_label_probability (column)
  • Removed median (column)

Bug fixes

  • Accurate null count and total_samples on profile updates
  • Each column now receives the same sampled indices; enabling row_is_null_count

Don't miss a new DataProfiler release

NewReleases is sending notifications on new releases.